Comment by tomasello77

1 year ago

I tried using Gemini 2.0 Flash for PDF-to-Markdown parsing of scientific papers after having good results with GPT-4o, but the experience was terrible.

When I sent images of PDF page with extracted text, Gemini mixed headlines with body text, parsed tables incorrectly, and sometimes split tables—placing one part at the top of the page and the rest at the bottom. It also added random numbers (like inserting an “8” for no reason).

When using the Gemini SDK to process full PDFs, Gemini 1.5 could handle them, but Gemini 2.0 only processed the first page. Worse, both versions completely ignored tables.

Among the Gemini models, 1.5 Pro performed the best, reaching about 80% of GPT-4o’s accuracy with image parsing, but it still introduced numerous small errors.

In conclusion, no Gemini model is reliable for PDF-to-Markdown parsing and beyond the hype - I still need to use GPT-4o.

0 comments

tomasello77

No comments yet

Contribute on Hacker News ↗