← Back to context

Comment by cedws

5 months ago

90% accuracy +/- 10%? What could that be useful for, that’s awfully low.

> accuracy is measured with the Needleman-Wunsch algorithm

> Crucially, we’ve seen very few instances where specific numerical values are actually misread. This suggests that most of Gemini’s “errors” are superficial formatting choices rather than substantive inaccuracies. We attach examples of these failure cases below [1].

> Beyond table parsing, Gemini consistently delivers near-perfect accuracy across all other facets of PDF-to-markdown conversion.

That seems fairly useful to me, no? Maybe not for mission critical applications, but for a lot of use cases, this seems to be good enough. I'm excited to try these prompts on my own later.

This is "good enough" for Banks to use when doing due diligence. You'd be surprised how much noise is in the system with the current state of the art: algorithms/web scrapers and entire buildings of humans in places like India.

Author here — measuring accuracy in table parsing is surprisingly challenging. Subtle, almost imperceptible differences in how a table is parsed may not affect the reader's understanding but can significantly impact benchmark performance. For all practical purposes, I'd say it's near perfect (also keep in mind the benchmark is on very challenging tables).

having seen some of these tables, I would guess that's probably above a layperson's score . Some are very complicated or just misleadingly structured.