Comment by deepakkumarb
1 month ago
I get that these approaches work, and they’re totally valid engineering trade-offs. But I don’t think they’re the same thing as real model improvements. If we’re just throwing more tokens, longer chains of thought, or extra tools at the problem, that feels more like brute force than genuine progress.
And that distinction matters in practice. If getting slightly better answers means using 5–10× more tokens or a bunch of external calls, the costs add up fast. That doesn’t scale well in the real world. It’s hard to call something a breakthrough when quality goes up but the bill and latency go up just as much.
I also think we should be careful about reading too much into benchmarks. A lot of them reward clever prompting and tool orchestration more than actual general intelligence. Once you factor in reliability, speed, and cost, the story often looks less impressive.
No comments yet
Contribute on Hacker News ↗