Comment by jiggawatts

1 day ago

> We've been having really good models for a couple of years now...

Don’t allow the “wow!” factor of the novelty of LLMs cloud your judgement. Today’s models are very noticeably smarter, faster, and overall more useful.

I’ve had a few toy problems that I’ve fed to various models since GPT 3 and the difference in output quality is stark.

Just yesterday I was demonstrating to a colleague that both o3 mini and Gemini Flash Thinking can solve a fairly esoteric coding problem.

That same problem went from multiple failed attempts that needed to be manually stitched together - just six months ago — to 3 out of 5 responses being valid and only 5% of output lines needing light touch ups.

That’s huge.

PS: It’s a common statistical error to conflate success rate with negative error rate. Going from 99% success to 99.9% is not 1% better, it’s 10x better! Most AI benchmarks are still reporting success rate, but ought to start focusing on the error rate soon to avoid underselling their capabilities.