Comment by seanmcdirmid
17 hours ago
Is that true though? Data centers can be placed anywhere in the USA, they could be placed near a bunch of hydro or wind farm resources in the western grid which has little coal anyways outside of one line from Utah to socal. The AI doesn’t have to be located anywhere near to where it is used since fiber is probably easier to run than a high voltage power line.
That was already done years ago and people are predicting that the grid will be maxed out soon.
Build new data centers near sources of power, and grid capacity isn’t going to be a problem. Heck, American industry used to follow that (building garment factories on fast moving rivers before electricity was much of a thing, Boeing grew up in the northwest due to cheap aluminum helped out by hydro). Why is AI somehow different from an airplane?
They'll have to build new power generation and build the data centers next to it.
You are massively conflating what is possible with what is done.
1 reply →
There are a large number of reasons the AI datacenters are geographically distributed--just to list a few off the top of my head which come up as top drivers: latency, data sovereignty, resilience, grid capacity, renewable energy availability.
Why does latency matter for a model that responds in 10s of seconds? Latency to a datacenter is measured in 10s or 100s of milliseconds, which is 3-4 orders of magnitude less.
Two reasons that I understand 1. not all these AIs are LLMs and many have much lower latency SLAs than chat and 2. These are just one part of a service architecture and when you have multiple latencies across the stack they tend to have multiplicative effects.
If you look at a model with a diverse competitive provider set like llama 3 the latency is 1/4 second, and it will definitely improve at a minimum incrementally if quality is held constant: https://artificialanalysis.ai/models/llama-3-3-instruct-70b/... Remember that as long as you experience the response linearly (very much the case for audio output for eg) then the first-chunk latency is your actual latency, not to stream the entire response.