Comment by anonymous908213

4 days ago

I'm not talking about a lora, it would be nice if you could refrain from acting like a dipshit.

> and you could not make deepseek or mistral if I gave you a big gpu cluster. You have the weights but you have no idea how they work and you couldn't recreate them.

I personally couldn't, but the team behind that startup as a whole absolutely could. We did attempt training our own models from scratch and made some progress, but the compute cost was too high to seriously pursue. It's not because we were some super special rocket scientists, either. There is a massive body of literature published about LLM architecture already, and you can replicate the results by learning from it. You keep attempting to make this out to be literal fucking magic, but it's just a computer program. I guess it helps you cope with your own complete lack of understanding to pretend that it is magical in nature and can't be understood.

1 comment

anonymous908213

pfisch 3 days ago

No, it's just obvious that there is a massive race going with trillions of dollars on the line. No one is going to reveal the details of how they are making these AIs. Any public information that exists about them is way behind SOTA.

I strongly suspect that it is really hard to get these models to converge though so I have no idea what your team could've theoretically made, but it certainly would've been well behind SOTA.

My point is if they are changing core elements of the architecture you would have no idea because they wouldn't be telling anyone about it. So thinking you know how Opus 4.6 works just isn't realistic until development slows down and more information comes out about them.