Comment by iudqnolq

17 days ago

In what contexts is 0.84 ± 0.16 actually "nearly perfect"?

I think they meant relative to the best other approach, which is Reducto’s given that they are the creators of the benchmark:

Reducto's own model currently outperforms Gemini Flash 2.0 on this benchmark (0.90 vs 0.84). However, as we review the lower-performing examples, most discrepancies turn out to be minor structural variations that would not materially affect an LLM’s understanding of the table.