Comment by blharr
17 hours ago
It briefly touches on training, but uses a seemingly misleading statistic that comes from (in reference to GPT-4) extremely smaller models.
This article [1] says that 300 [round-trip] flights are similar to training one AI model. Its reference of an AI model is a study done on 5-year-old models like BERT (110M parameters), Transformer (213M parameters), and GPT-2. Considering that models today may be more than a thousand times larger, this is an incredulous comparison.
Similar to the logic of "1 mile versus 60 miles in a massive cruise ship"... the article seems to be ironically making a very similar mistake.
[1] https://icecat.com/blog/is-ai-truly-a-sustainable-choice/#:~....
737-800 burns about 3t of fuel per hour. NYC-SFO is about 6h, so 18t of fuel. Jet fuel energy density is 43MJ/kg, so 774000 MJ per flight, which is 215 MWh. Assuming the 60 GWh figure is true (seems widely cited on the internets), it comes down to 279 one-way flights.
Thanks, I missed that 60 GWh figure. I got confused because the quotes around the statement, so I looked it up and couldn't find a quote. I realize now that he's quoting himself making that statement (and it's quite accurate)
I am surprised that, somehow, the statistic didn't change from GPT-2-era to GPT-4. Did GPUs really get that much more efficient? Or that study must have some problems