Comment by aucisson_masque

15 hours ago

There isn't even deepseek V4.

I'd rather trust LLM arena leaderboard, which puts it on par with sonnet.

1 comment

aucisson_masque

LM Arena uses human side by side voting, which limits its applicability to complex tasks.

The ARCPrize leaderboard does have Deepseek V3.2, which only scored 4% on ARC-AGI 2 (while the top models score over 80%). It also Kimi and Qwen, but they also didn't perform well.