Comment by simonw

13 hours ago

Lots of comments in here that seem to have missed that this is about using vision-LLMs for OCR.

This makes it a slightly different issue from "hallucination" as seen in text based models. The model (which I think we can assume is GPT-5-mini in this case) is being fed scanned images of PDFs and is incorrectly reading the data from them.

Is this still a hallucination? I've been unable to identify a robust definition of that term, so it's not clearly wrong to call a model misinterpreting a document a "hallucination" even though it feels to me like a different category of mistake to an LLM inventing the title of a non-existent paper or lawsuit.

6 comments

simonw

fxwin 11 hours ago

> this is about using vision-LLMs for OCR

Is it? To me it sounds like they do OCR first, then extract from the result with LLM:

"Cortex uses automated extraction (optical character recognition (OCR) and natural language processing (NLP)) to parse clerkship grades from medical school transcripts."

simonw 11 hours ago

See comment I just posted here: https://news.ycombinator.com/item?id=45582982

lysecret 13 hours ago

These kinds of errors have always existed and will always exist there is no perfect way to extract info from documents like this.

simonw 13 hours ago
The models really are getting better though. Compare Gemini 1.5 and Gemini 2.5 on the same PDF document (I've done this a bunch) and you can see the difference.
The open question is how much better they need to get before they can be deployed for situations like this that require a VERY high level of reliability.
- lysecret 13 hours ago
  
  I fully agree. My point was more a lot of commenters seem or implicitly compare the llm based approach with some “better” or “simpler” approach which really doesn’t exist from my estimation LLMs are sota for this kind of extractions (though they still have issues).
- hoosieree 12 hours ago
  
  People don't respect the chasm between "obviously no mistakes" and "no obvious mistakes".