← Back to context

Comment by pfisch

17 hours ago

Yes. I also don't think it is realistic to pretend you understand how frontier LLMs operate because you understand the basic principles of how the simple LLMs worked that weren't very good.

Its even more ridiculous than me pretending I understand how a rocket ship works because I know there is fuel in a tank and it gets lit on fire somehow and aimed with some fins on the rocket...

The frontier LLMs have the same overall architecture as earlier models. I absolutely understand how they operate. I have worked in a startup wherein we heavily finetuned Deepseek, among other smaller models, running on our own hardware. Both Deepseek's 671b model and a Mistral 7b model operate according to the exact same principles. There is no magic in the process, and there is zero reason to believe that Sonnet or Opus is on some impossible-to-understand architecture that is fundamentally alien to every other LLM's.

  • Deepseek and Mistral are both considerably behind Opus, and you could not make deepseek or mistral if I gave you a big gpu cluster. You have the weights but you have no idea how they work and you couldn't recreate them.

    > I have worked in a startup wherein we heavily finetuned Deepseek, among other smaller models, running on our own hardware.

    Are you serious with this? I could go make a lora in a few hours with a gui if I wanted to. That doesn't make me qualified to talk about top secret frontier ai model architecture.

    Now you have moved on to the guy who painted his honda, swapped out some new rims, and put some lights under it. That person is not an automotive engineer.