Comment by andrewla

2 months ago

Why on earth would they do this? This is not a fundamentally useful task; it serves as a measure of the LLM's ability to generalize to tasks outside of its training data and that strain the limits of what it can express.

Because optics matter. they are all ultimately fundraising and competing and this is terrible PR

Ask Jeeves from 1997 could answer this question, so tell me why we need to devote a nation-state amount of compute power to feed an “AI” that confidently gets kindergarten level questions dead ass wrong?

I have the same kind of question when I watch the AI summary on Google output tokens one-by-one to give me less useful information that is right there on the first search result from Wikipedia (fully sourced, too)

If you’re advertising that your new LLM is like a PhD in your pocket, and it fails on a task that a first grader can do, it makes it hard to take your other claims seriously.