Comment by shadeslayer_
11 hours ago
Do these benchmarks even add any value at this point? This one is basically Cursor saying that their model is as good as the frontier ones at a fraction of the price. The independent benchmarks are probably part of training data now and the models are pattern-matching against them all the time. The final test of a model (and the harness, probably) is how good it works FOR YOU - since most of the models can pretty much do most of our tasks on a daily basis - it boils down to which one has the least friction to its usage.
No comments yet
Contribute on Hacker News ↗