Comment by sigbottle

12 hours ago

Very interested in this! I'm mainly a ChatGPT user; for me, o3 was the first sign of true "intelligence" (not 'sentience' or anything like that, just actual, genuine usefulness). Are these models at that level yet? Or are they o1? Still GPT4 level?

Not nearly o3 level. Much better than GPT4, though! For instance Qwen 3 30b-a3b 2507 Reasoning gets 46 vs GPT 4's 21 and o3's 60-something on Artificial Analysis's benchmark aggregation score. Small local models ~30b params and below tend to benchmark far better than they actually work, too.