Comment by gpm

4 hours ago

One of the many things Google was pitching today is that they're going to run things like google search with access to linux container environments to do things like run tool calls... which will presumably be able to rasterize SVGs and show them to the model.

But Simon says he runs these through the API without tool access specifically to prevent that sort of "cheating". I.e. it's an LLM benchmark not an LLM+Harness benchmark.