← Back to context

Comment by nurettin

10 days ago

I'm going to wear the tinfoil hat: a firm is able to produce a sought-after behavior a few months later and throws people off. Is it more likely that the firm (worth billions at this point) is engineering these solutions into the model, or is it because of emergent neural network architectural magic?

I'm not saying that they are being bad actors, just saying this is more probable in my mind than an LLM breakthrough.

It depends what you mean by "engineering these solutions into the model". Using better data leads to better models given the same architecture and training. Nothing wrong with it, it's hard work, it might be with as specific goal in mind. LLM "breakthroughs" aren't really a thing at this point. It's just one little thing after another.

  • Sure, I specifically pre-agreed to it not being ill will. What I mean is keeping tabs on the latest demand (newer benchmarks) and making sure their model delivers in some fashion. But it is mundane and they don't say that. And when a major number increases, people don't assume they just added more specific training data.

    • Yup, it's a fair point. We very quickly got down to the nitty gritty with these things. Hopefully, like semiconductors nitty gritty results in a lot of big performance gains for decades.