Comment by zxexz
2 months ago
You should self host not trust a third party application if you run into either of those things. The weights are open. DeepSeek didn’t change, the application you’re accessing it through did.
Or use an enterprise-ready service. Bedrock, firecracker, etc
I like your thinking. Nobody can use ChatGPT offline or retrain it, but DeepSeek is fully opensource. It's technology, I don't care which country made it, if it's high quality engineering, it's just that. The data it was trained on doesn't matter if you can train a wholly new model using the exact same principles and stack they opensourced with your own data. Which is really awesome.
I use openrouter.ai to have no timeouts and offtimes, since DeepSeek seems to get DDoS attacks somehow, or there are too many users, idk.
> Nobody can use ChatGPT offline or retrain it, but DeepSeek is fully opensource.
Well, you likely can't train DeepSeek yourself either.
You most likely:
* you philosophically don't have all the training data to train it yourself (so the claim it's opensource or open-whatever are dubious in the first place);
or
* you don't have the compute to "press the train button" and getting the weights back before the sun expires. While considered ridiculously ground-breaking cheap, those costs were still estimated to be around 6 million USD (DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a "mere $5.576 million"). I remember that when it was released, the mere thought that "people" cound "train AI cheaply with only 6 million USD" made one of the worst drops in the Nvidia valuation.
This is really not true my friend. I would love to help you if I had some more time, but let me look for a tutorial.
Because the FineWeb Dataset is already super good. You can train 7B or 32B Param models at home
The >600B Param model isn't really using all the data effectively, but with a MacStudio Farm you can also train that one at home (if you have enough money to buy at least 100).
Here's the easy way: https://github.com/FareedKhan-dev/train-deepseek-r1
More details: https://www.bentoml.com/blog/the-complete-guide-to-deepseek-...
Here's how DeepSeek-R1-Zero was built, basically from 0 to Hero, including weights the FULL Training Data and everything you need to get it running locally or on servers.https://medium.com/@GenerationAI/how-deepseek-r1-zero-was-re...
For $30 USD you can also train a small DeepSeek at home!
https://github.com/Jiayi-Pan/TinyZero
https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero (the model)
1 reply →