Comment by Papazsazsa
3 days ago
"Opus 4.5 feels to me like"
The article is fine opinion but at what point are we going to either:
a) establish benchmarks that make sense and are reliable, or
b) stop with the hypecycle stuff?
3 days ago
"Opus 4.5 feels to me like"
The article is fine opinion but at what point are we going to either:
a) establish benchmarks that make sense and are reliable, or
b) stop with the hypecycle stuff?
>establish benchmarks that make sense and are reliable
How aren't current LLM coding benchmarks reliable?
They're manipulated.
Unless you are going to be more specific, that criticism applies to all benchmarks that are connected to a positive gain, not just AI coding benchmarks.
> make sense and are reliable
If you can figure out how to create benchmarks that make sense, are reliable, correlate strongly to business goals, and don't get immediately saturated or contorted once known, you are well on your way to becoming a billionaire.