Comment by uh_uh

7 months ago

But don't you think this might be a case where there is both self-congragulation and actual progress?

2 comments

uh_uh

The level of proof for the latter is much higher, and IMO, OpenAI hasn't met the bar yet.

Something really funky is going on with newer AI models and benchmarks, versus how they perform subjectively when I use them for my use-cases. I say this across the board[1], not just regarding IpenAI. I don't know if frontier labs have run into Goodheart's law viz benchmarks, or if my use-cases that are atypical.

1. I first noticed this with Claud 3.5 vs Claud 3.7

gellybeans 7 months ago

That's a fair question, and I agree. I just find it odd how we shout across the aisle, whether in favor or against. It's a case of thinking the tech is neat, while cringing at all the money-people and their ideations.