← Back to context

Comment by password4321

15 days ago

As opposed to the discussion 2 days ago with 400+ comments:

Ingesting PDFs and why Gemini 2.0 changes everything

https://news.ycombinator.com/item?id=42952605

FTA:

> This week, there was a viral blog about Gemini 2.0 being used for complex PDF parsing, leading many to the same hypothesis we had nearly a year ago at this point. Data ingestion is a multistep pipeline, and maintaining confidence from these nondeterministic outputs over millions of pages is a problem.

That's what I thought too, but apparently the title is pure, absolute, rage-inducing clickbait.

The actual conclusion is that they make classes of errors that traditional OCR programs either don't make, or make in different ways.