← Back to context

Comment by kbumsik

4 months ago

> My impression is that OCR is basically solved at this point.

Not really in practice to me. Especially they still struggle with Table format detection.

11 comments

kbumsik

Reply

coulix 4 months ago

This.

Any complex parent table span cell relationship still has low accuracy.

Try the reverse, take a complex picture table and ask Chatgpt5, claude Opus 3.1, Gemini Pro 2.5 to produce a HTML table.

They will fail.

bobsmooth 4 months ago
Maybe I misunderstood the assignment but it seems to work for me.
https://chatgpt.com/share/68f5f9ba-d448-8005-86d2-c3fbae028b...
Edit: Just caught a mistake, transcribed one of the prices incorrectly.
- kbumsik 4 months ago
  
  Right, I wouldn't use full table detection to VLM model because they tend to mistake with numbers in table...
pietz 4 months ago
Maybe my imagination is limited or our documents aren't complex enough, but are we talking about realistic written documents? I'm sure you can take a screenshot of a very complex spreadsheet and it fails, but in that case you already have the data in structured form anyway, no?
- kbumsik 4 months ago
  
  > realistic written documents?
  Just get a DEF 14A (Annual meeting) filing of a company from SEC EDGAR.
  I have seen so many mistakes when looking at the result closely.
  Here is a DEF 14A filing from Salseforce. You can print it to a PDF and then try converting.
  https://www.sec.gov/Archives/edgar/data/1108524/000110852425...
  
  3 replies →
- daemonologist 4 months ago
  
  Now if someone mails or faxes you that spreadsheet? You're screwed.
  Spreadsheets are not the biggest problem though, as they have a reliable 2-dimensional grid - at worst some cells will be combined. The form layouts and n-dimensional table structures you can find on medical and insurance documents are truly unhinged. I've seen documents that I struggled to interpret.
  
  1 reply →

richardlblair 4 months ago

I had mentioned this when the new QWEN model dropped - I have a stack of construction invoices that fail through both OCR and OpenAI.

It's a hard (and very interesting) problem space.