Comment by 2ndorderthought
17 hours ago
I really don't think open models will lose. I think they are cheaper to train because they have to be more efficient than the monstrosities we have now.
There is no theory that says the current frontier models cannot exist in models with 1/100th the compute waste ;). When we start trending in that direction, and oh wow we truly are, there will be no reason for these services. You could run them on your own hardware without serious investments.
The moat openai and anthropic have is them among others have attempted to buy all of the computer hardware for the next two years. That's intentional. They know the only existential threat to them is anyone coming up with a way to do this better than them. It's already happened and it's going to become more and more divergent.
I’m interested in learning more about your theory that these models can be trained more cheaply. Is anyone doing it from scratch, rather than adversarial distillation?
It is a lot cheaper to train a 27b model such as qwen3.6 which you can even vibe code or agentic code with than it is to train a 1t+ parameter model. It runs on a single commodity GPU for goodness sake
It's not a theory. These smaller models that are coming out are huge advances for the field.
I can't comment on companies training practices. That would be proprietary stuff I guess. I think the claims that the advances being made are due to distillation alone are completely unfair. The advances alone are not just data.
It almost doesn’t matter if it’s trained using adversarial distillation - if it’s nearly as good, and one-hundredth the cost, the choice is obvious.