Comment by maeil
18 hours ago
The section on training feels weak, and that's what the discussion is mainly about.
Many companies are now trying to train models as big as GPT-4. OpenAI is training models that may well be even much larger than GPT-4 (o1 and o3). Framing it as a one-time cost doesn't seem accurate - it doesn't look like the big companies will stop training new ones any time soon, they'll keep doing it. So one model might only be used half a year. And many models may not end up used at all. This might stop at some point, but that's hypothetical.
It briefly touches on training, but uses a seemingly misleading statistic that comes from (in reference to GPT-4) extremely smaller models.
This article [1] says that 300 [round-trip] flights are similar to training one AI model. Its reference of an AI model is a study done on 5-year-old models like BERT (110M parameters), Transformer (213M parameters), and GPT-2. Considering that models today may be more than a thousand times larger, this is an incredulous comparison.
Similar to the logic of "1 mile versus 60 miles in a massive cruise ship"... the article seems to be ironically making a very similar mistake.
[1] https://icecat.com/blog/is-ai-truly-a-sustainable-choice/#:~....
737-800 burns about 3t of fuel per hour. NYC-SFO is about 6h, so 18t of fuel. Jet fuel energy density is 43MJ/kg, so 774000 MJ per flight, which is 215 MWh. Assuming the 60 GWh figure is true (seems widely cited on the internets), it comes down to 279 one-way flights.
Thanks, I missed that 60 GWh figure. I got confused because the quotes around the statement, so I looked it up and couldn't find a quote. I realize now that he's quoting himself making that statement (and it's quite accurate)
I am surprised that, somehow, the statistic didn't change from GPT-2-era to GPT-4. Did GPUs really get that much more efficient? Or that study must have some problems
I am sure that’s intentional, because this article is the same thing we see from e/acc personalities any time the environmental impact is brought up.
Deflection away from what actually uses power and pretending the entire system is just an API like anything else.