Comment by fc417fc802
9 hours ago
If cutting edge were a hard requirement then given the lead times involved I think the author would be correct. However I think there's a fundamental error in failing to account for the fact that you don't need cutting edge chips to do AI. Sure it makes it cheaper and faster but it's absolutely not a requirement. You could train a state of the art model on cluster of 12+ year old boxes (ie Intel's 22 nm and DDR3) but if you want to get the job done in a similar timeframe you're going to pay out the ass for electricity. Your research pipeline would necessarily be narrower due to physical and monetary limitations but that's not the end of the world.
That’s like saying you could train a state of the art model by hand, and it’ll only cost you a lot of man-hours.
Realistically, to train a frontier model you’d need quite a lot of compute. GPT4, which is old news, was supposedly trained on 25,000 A100s.
There’s just no reasonable way of catching modern hardware with old hardware+time/electricity.
Training methods and architectures keep getting more efficient by leaps and bounds and scaling up was well into the realm of diminishing returns last I checked. The necessity of exceeding 100B seems questionable. Just because you can get some benefits by piling ever more data on doesn't necessarily mean you have to.
Also keep in mind we aren't talking about a small company wanting to do competitive R&D on a frontier model. We're talking about a world superpower that operates nuclear reactors and built something the size of the three gorges dam deciding that a thing is strategically necessary. If they were willing to spend the money I am absolutely certain that they could pull it off.
I suspect the bottleneck on 12+ year old hardware wouldn't be power but the interconnects. SOTA training is bound by gradient synchronization latency. Without NVLink you hit a hard wall where the compute spends most of its time waiting on PCIe or ethernet.
Fair point. Though if this were actually attempted I imagine it would start with making changes to the model architecture, the physical hardware, or both.
My hypothetical is probably somewhat over the top given that isn't China somewhere in the vicinity of 7 nm at present?