Comment by jiggawatts
2 days ago
> We've been having really good models for a couple of years now...
Don’t allow the “wow!” factor of the novelty of LLMs cloud your judgement. Today’s models are very noticeably smarter, faster, and overall more useful.
I’ve had a few toy problems that I’ve fed to various models since GPT 3 and the difference in output quality is stark.
Just yesterday I was demonstrating to a colleague that both o3 mini and Gemini Flash Thinking can solve a fairly esoteric coding problem.
That same problem went from multiple failed attempts that needed to be manually stitched together - just six months ago — to 3 out of 5 responses being valid and only 5% of output lines needing light touch ups.
That’s huge.
PS: It’s a common statistical error to conflate success rate with negative error rate. Going from 99% success to 99.9% is not 1% better, it’s 10x better! Most AI benchmarks are still reporting success rate, but ought to start focusing on the error rate soon to avoid underselling their capabilities.
No comments yet
Contribute on Hacker News ↗