← Back to context

Comment by I_am_tiberius

9 hours ago

Open weight!

Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.

Just this week they published a serious foundational library for LLMs https://github.com/deepseek-ai/TileKernels

Others worth mentioning:

https://github.com/deepseek-ai/DeepGEMM a competitive foundational library

https://github.com/deepseek-ai/Engram

https://github.com/deepseek-ai/DeepSeek-V3

https://github.com/deepseek-ai/DeepSeek-R1

https://github.com/deepseek-ai/DeepSeek-OCR-2

They have 33 repos and counting: https://github.com/orgs/deepseek-ai/repositories?type=all

And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.

The models from Chinese Big Tech and some of the small ones are open weights only. (and allegedly benchmaxxed) (see https://xcancel.com/N8Programs/status/2044408755790508113). Not the same.

  • DeepSeek's models are indeed open weight. Why do you feel that pointing this out would be considered slander?

    • I think they were reading GP's comment as a correction. Like "not open-source, just open weight". I'm not sure if their reading was accurate but I enjoyed their high effort comment nonetheless

  • It’s not slander to say something true. These are open weights, not open source. They don’t provide the training data or the methodology requires to reproduce these weights.

    So you can’t see what facts are pruned out, what biases were applied, etc. Even more importantly, you can’t make a slightly improved version.

    This model is as open source as a windows XP installation ISO.

Weights are the source, training data is the compiler

  • Training data == source code, training algorithm == compiler, model weights == compiled binary.

    • Training algorithm is the programmer, weights are the code that you run in an interpreter

  • isn't it more like the data is the source, the training process is the compiler, and the weights are the binary output.