← Back to context

Comment by gellybeans

5 days ago

Making an account just to point out how these comments are far more exhausting, because they don't engage with the subject matter. They are just agreeing with a headline and saying, "See?"

You say, "explaining away the increasing performance" as though that was a good faith representation of arguments made against LLMs, or even this specific article. Questionong the self-congragulatory nature of these businesses is perfectly reasonable.

But don't you think this might be a case where there is both self-congragulation and actual progress?

  • The level of proof for the latter is much higher, and IMO, OpenAI hasn't met the bar yet.

    Something really funky is going on with newer AI models and benchmarks, versus how they perform subjectively when I use them for my use-cases. I say this across the board[1], not just regarding IpenAI. I don't know if frontier labs have run into Goodheart's law viz benchmarks, or if my use-cases that are atypical.

    1. I first noticed this with Claud 3.5 vs Claud 3.7

  • That's a fair question, and I agree. I just find it odd how we shout across the aisle, whether in favor or against. It's a case of thinking the tech is neat, while cringing at all the money-people and their ideations.