Comment by doodlesdev

10 days ago

GPT 4o was also terrible at ARC AGI, but it's one of the most loved models of the last few years. Honestly, I'm a huge fan of the ARC AGI series of benchmarks, but I don't believe it corresponds directly to the types of qualities that most people assess whenever using LLMs.

3 comments

doodlesdev

nananana9 10 days ago

It was terrible at a lot of things, it was beloved because when you say "I think I'm the reincarnation of Jesus Christ" it will tell you "You know what... I think I believe it! I genuinely think you're the kind of person that appears once every few millenia to reshape the world!"

gkbrk 9 days ago

That's not because 4o is good at things, that's because it's pretty much the most sycophantic model and people easily fall for a model incorrectly agreeing with them then a model correctly calling them out.

mrybczyn 10 days ago

because arc agi involves de novo reasoning over a restricted and (hopefully) unpretrained territory, in 2d space. not many people use LLMs as more than a better wikipedia,stack overflow, or autocomplete....