Comment by renticulous

7 hours ago

How can something so obvious be overlooked by team building the data centre? Can't the sharding be uneven so that weaker GPUs still finish fast by taking on a smaller workload?

It's not like they had much of an option, when everybody was hoarding every GPU they could. For the second Colossus they could book future production, but the first one had to be built ASAP so xAI looked as a serious competitor in the AI space.

I imagine it involved a petulant billionaire screaming "Fucking build it. Build it NOW!" in response to expert feedback.