← Back to context

Comment by thundergolfer

1 day ago

“Pure garage-energy” is a great phrase.

Most interested to see their inference stack, hope that’s one of the 5. I think most people are running R1 on a single H200 node but Deepseek had much lower RAM per GPU for their inference and so had some cluster based MoE deployment.

Their tech report says one inference deployment is around 400 GPUs...

  • You need that to optimize load balancing. Unfortunately that gain is not available to small or individual deployment.

I don't think the RAM size of the H800 was nerfed (80GB), but rather the memory bandwidth between gpus.

But yeah, would be interesting to see how they optimized for that.

  • Correct. There are 3 main ways to "gimp" high end GPUs meant for training - "cores", "on-chip memory speed" and "interconnects". IIUC the H800 had the first 2 unchanged but halved the interconnect speeds.

    H20 is the next iteration of the "sanctions" that I believe also limited the "cores" but left the on-chip memory intact, or slightly higher (from the new generation).

[flagged]

“Pure garage-energy” with 10,000 A100s, apparently. I’d love to have a garage like that.

  • From https://semianalysis.com/2025/01/31/deepseek-debates/

    > We believe DeepSeek has access to around 10,000 of these H800s and about 10,000 H100s. Furthermore they have orders for many more H20’s, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months.

    • The paper in the repo says: “ For DL training, we deployed the Fire-Flyer 2 with 10,000 PCIe A100 GPUs“

    • that report is lazy. they assume all GPUs owned (openly reported) by the parent company (a hedge fund which claims to use those GPUs to generate trades) were used by the invested company.

      that's as dumb as saying coca cola have acccess to all offices of Berkshire Hathaway.

      likewise, all comments praising deepseek history are also misleading as the company barely exists for a year.

      everything is opaque marketing being repeated. just drop the off topic bla bla bla and focus on the facts and code in front of you.

      thanks for coming to my ted talk.