Comment by DoctorOetker

11 hours ago

this makes no sense, the radiation of heat from the GPU's came from electrical energy, the electrical energy came from the efficient fraction of solar panel energy, the inefficient fraction being heating of the solar panel, the total amount of heat that needs to be dissipated is simply the total amount of energy incident on the solar panels.

4 comments

DoctorOetker

tempestn 9 hours ago

True, the solar panels would need to be enormous too.

DoctorOetker 5 hours ago
Let's say we wanted to train LLaMa 3.1 405B:
[0] https://developer.nvidia.com/deep-learning-performance-train...
Click the "Large Language Model" tab next to the default "MLPerf Training" tab.
That takes 16.8 days on 128 B200 GPU's:
> Llama3 405B 16.8 days on 128x B200
A DGX B200 contains 8xB200 GPU's. So it takes 16.8 days on 16 DGX B200's.
A single DGX (8x)B200 node draws about 14.3 kW under full load.
> System Power Usage ~14.3 kW max
source [1] https://www.nvidia.com/en-gb/data-center/dgx-b200
16 x 14.3 kW = ~230 kW
at ~20% solar panel efficiency, we need 1.15 MW of optical power incident on the solar panels.
The required solar panel area becomes 1.15 * 10^6 W / 1.360 * 10^3 W / m ^ 2 = 846 m ^ 2.
thats about 30 m x 30 m.
From the center of the square solar panel array to the tip of the pyramid it would be 3x30m = 90 m.
An unprecedented feat? yes. But no physics is being violated here. The parts could be launched serially and then assembled in space. Thats a device that can pretrain from scratch LLaMa 3.1 in 16.8 days. It would have way to much memory for LLaMa 3.1: 16 x 8 x 192 GB = ~ 25 TB of GPU RAM. So this thing could pretrain much larger models, but would also train them slower than a LLaMa 3.1.
Once up there it enjoys free energy for as long as it survives, no competing on the electrical grid with normal industry, or domestic energy users, no slow cooking of the rivers and air around you, ...
- lm28469 1 hour ago
  
  We're talking past each other I think. In theory we can cool down anything we want, that's not the problem. 8 DGX B200 isn't a datacenter, and certainly not anywhere close to the figures discussed (500-1000tw of ai satellites per year)
  Nobody said sending a single rack and cooling it is technically impossible. We're saying sending datacenters worth of rack is insanely complex and most likely not financially viable nor currently possible.
  Microsoft just built a datacenter with 4600 racks of GB300, that's 4600 * 1.5t, that alone weights more than everything we sent into orbit in 2025, and that's without power nor cooling. And we're still far from a single terawatt.
  
  1 reply →