← Back to context

Comment by zxexz

2 months ago

You should self host not trust a third party application if you run into either of those things. The weights are open. DeepSeek didn’t change, the application you’re accessing it through did.

Or use an enterprise-ready service. Bedrock, firecracker, etc

I like your thinking. Nobody can use ChatGPT offline or retrain it, but DeepSeek is fully opensource. It's technology, I don't care which country made it, if it's high quality engineering, it's just that. The data it was trained on doesn't matter if you can train a wholly new model using the exact same principles and stack they opensourced with your own data. Which is really awesome.

I use openrouter.ai to have no timeouts and offtimes, since DeepSeek seems to get DDoS attacks somehow, or there are too many users, idk.

  • > Nobody can use ChatGPT offline or retrain it, but DeepSeek is fully opensource.

    Well, you likely can't train DeepSeek yourself either.

    You most likely:

    * you philosophically don't have all the training data to train it yourself (so the claim it's opensource or open-whatever are dubious in the first place);

    or

    * you don't have the compute to "press the train button" and getting the weights back before the sun expires. While considered ridiculously ground-breaking cheap, those costs were still estimated to be around 6 million USD (DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a "mere $5.576 million"). I remember that when it was released, the mere thought that "people" cound "train AI cheaply with only 6 million USD" made one of the worst drops in the Nvidia valuation.