Comment by pfisch
1 day ago
Yes. I also don't think it is realistic to pretend you understand how frontier LLMs operate because you understand the basic principles of how the simple LLMs worked that weren't very good.
Its even more ridiculous than me pretending I understand how a rocket ship works because I know there is fuel in a tank and it gets lit on fire somehow and aimed with some fins on the rocket...
The frontier LLMs have the same overall architecture as earlier models. I absolutely understand how they operate. I have worked in a startup wherein we heavily finetuned Deepseek, among other smaller models, running on our own hardware. Both Deepseek's 671b model and a Mistral 7b model operate according to the exact same principles. There is no magic in the process, and there is zero reason to believe that Sonnet or Opus is on some impossible-to-understand architecture that is fundamentally alien to every other LLM's.
Deepseek and Mistral are both considerably behind Opus, and you could not make deepseek or mistral if I gave you a big gpu cluster. You have the weights but you have no idea how they work and you couldn't recreate them.
> I have worked in a startup wherein we heavily finetuned Deepseek, among other smaller models, running on our own hardware.
Are you serious with this? I could go make a lora in a few hours with a gui if I wanted to. That doesn't make me qualified to talk about top secret frontier ai model architecture.
Now you have moved on to the guy who painted his honda, swapped out some new rims, and put some lights under it. That person is not an automotive engineer.
I'm not talking about a lora, it would be nice if you could refrain from acting like a dipshit.
> and you could not make deepseek or mistral if I gave you a big gpu cluster. You have the weights but you have no idea how they work and you couldn't recreate them.
I personally couldn't, but the team behind that startup as a whole absolutely could. We did attempt training our own models from scratch and made some progress, but the compute cost was too high to seriously pursue. It's not because we were some super special rocket scientists, either. There is a massive body of literature published about LLM architecture already, and you can replicate the results by learning from it. You keep attempting to make this out to be literal fucking magic, but it's just a computer program. I guess it helps you cope with your own complete lack of understanding to pretend that it is magical in nature and can't be understood.