Comment by lysace
5 months ago
It's fascinating how close these companies are to each other. Some company comes up with something clever/ground-breaking and everyone else has implemented it a few weeks later.
Hard not to think of Kurzweil's Law of Accelerating Returns.
It’s extremely unlikely that everyone is copying in a few weeks for models that themselves take many weeks if not longer to train. Great minds think alike, and everyone is influencing everyone. The history of innovation is filled with examples of similar discoveries around the same time but totally disconnected in the world. Now with the rate of publishing and the openness of the internet, you’re only bound to get even more of that.
There's never been a scientific field in history with the same radical openness norms that AI/Computational Linguistics folks have (all papers are free/open access and models/datasets are usually released openly and often forced to be MIT or similar licensed)
We have whoever runs NeurIPS/ICLR/ICML and the ACL to thank for this situation. Imagine if fucking Elsevier had strangleholded our industry too!
https://en.wikipedia.org/wiki/Association_for_Computational_...
> for models that themselves take many weeks if not longer to train.
they all have foundational heavy-trained model, and then they can do follow up experimental training much faster.
The copying here probably goes to strawberry from o1 which is like at least 6 months but maybe copying efforts started even earlier.
Isn't the reasoning thing essentially a bolt-on to existing trained models? Like basically a meta-prompt?
No.
DeepSeek and now related projects have shown it’s possible to add reasoning via SFT to existing models, but that’s not the same as a prompt. But if you look at R1 they do a blend of techniques to get reasoning.
For Anthropic to have a hybrid model where you can control this, it will have to be built into the model directly in its training and probably architecture as well.
If you’re a competent company filled with the best AI minds and a frontier model, you’re not just purely copying… you’re taking ideas while innovating and adapting.
The fundamental innovation is training the model to reason through reinforcement learning; you can train existing models with traces from these reasoning models to get you within the same ballpark, but taking it further requires you to do RL yourself.
Somewhat but not exactly? I think the models need to be trained to think.
It does seem like it will be very, very hard for the companies training their own models to recoup their investment when the capabilities of open-weight models catch up so quickly - general purpose LLMs just seem destined to be a cheap commodity.
Well, the companies releasing open weights also need to recoup their investments at some point, they can't coast on VC hype forever. Huge models don't grow on trees.
Or, like Meta, they make their money elsewhere and just seem interested in wrecking the economics of LLMs. As soon as an open-weight model is released, it basically sets a global floor that says "Models with similar or worse performance effectively have zero value," and that floor has been rising incredibly quickly. I'd be surprised if the vast, vast majority of queries ChatGPT gets couldn't get equivalently good results from llama3/deepseek/qwen/mistral models, even for those paying for the pro versions.
3 replies →
[dead]
Where RL can play into post training there's something of an anti-moat. Maybe a "tow rope"?
Let's say OAI releases some great new model. The moment it becomes available via API, everyone else can make use of that model to create high-quality RL training data, which can then be used to make their models perform better.
The very act of making an AI model commercially available is the same act which allows your competitors to pull themselves closer to you.