Comment by simonw

5 months ago

I find the framing of this story quite frustrating.

The purpose of new benchmarks is to gather tasks that today's LLMs can't solve comprehensively.

It an AI lab built a benchmark that their models scored 100% on they would have been wasting everyone's time!

Writing a story that effectively says "ha ha ha, look at OpenAI's models failing to beat the new benchemark they created!" is a complete misunderstanding of the research.

Shhh ... you're spoiling everybody's confirmation bias against LLMs. They are obviously terrible at coding, just as we have known all along, and everybody should laugh at them. Nothing to see here!

  • As long as these companies keep pretending AI is ready to replace humans, I will be biased against lies, thank you.

  • Since you are one of the cool kids in the know, can you share the road map to profitability and even better the expected/hyped ROI? Without extrpolations into science fiction, please.