← Back to context

Comment by ElectricalUnion

2 months ago

> Nobody can use ChatGPT offline or retrain it, but DeepSeek is fully opensource.

Well, you likely can't train DeepSeek yourself either.

You most likely:

* you philosophically don't have all the training data to train it yourself (so the claim it's opensource or open-whatever are dubious in the first place);

or

* you don't have the compute to "press the train button" and getting the weights back before the sun expires. While considered ridiculously ground-breaking cheap, those costs were still estimated to be around 6 million USD (DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a "mere $5.576 million"). I remember that when it was released, the mere thought that "people" cound "train AI cheaply with only 6 million USD" made one of the worst drops in the Nvidia valuation.

This is really not true my friend. I would love to help you if I had some more time, but let me look for a tutorial.

Because the FineWeb Dataset is already super good. You can train 7B or 32B Param models at home

The >600B Param model isn't really using all the data effectively, but with a MacStudio Farm you can also train that one at home (if you have enough money to buy at least 100).

Here's the easy way: https://github.com/FareedKhan-dev/train-deepseek-r1

More details: https://www.bentoml.com/blog/the-complete-guide-to-deepseek-...

Here's how DeepSeek-R1-Zero was built, basically from 0 to Hero, including weights the FULL Training Data and everything you need to get it running locally or on servers.https://medium.com/@GenerationAI/how-deepseek-r1-zero-was-re...

For $30 USD you can also train a small DeepSeek at home!

https://github.com/Jiayi-Pan/TinyZero

https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero (the model)

  • Ok, but those sources or methods will not reproducibly build the artifact that are the weights of DeepSeek R1 671B, that you claimed are "opensource". Because you can't see what they actually used to build it.

    DeepSeek didn't publish the exact dataset required to create it. How is having zero visibility over "the source" used to create something considered "opensource"?

    That extended definition of "opensource" is useless as almost anything that isn't unique in the universe can then be declared "opensource".