← Back to context

Comment by kgeist

3 hours ago

>Latency, throughput, and routes don't matter here. When it's 10 seconds for the first token and then a 1KB/sec streamed response, whatever is fine. You can serve Australia from the US and it'll barely matter.

This may be true for simpler cases where you just stream responses from a single LLM in some kind of no-brain chatbot. If the pipeline is a bit more complex (multiple calls to different models, not only LLMs but also embedding models, rerankers, agentic stuff, etc.), latencies quickly add up. It also depends on the UI/UX expectations.

Funny reading this, because the feature I developed can't go live for a few months in regions where we have to use Amazon Bedrock (for legal reasons), simply because Bedrock has very poor latency and stakeholders aren't satisfied with the final speed (users aren't expected to wait 10-15 seconds in that part of the UI, it would be awkward). And a single roundtrip to AWS Ireland from Asia is already like at least 300ms (multiply by several calls in a pipeline and it adds up to seconds, just for the roundtrips), so having one region only is not an option.

Funny though, in one region we ended up buying our own GPUs and running the models ourselves. Response times there are about 3x faster for the same models than on Bedrock on average (and Bedrock often hangs for 20+ seconds for no reason, despite all the tricks like cross-region inference and premium tiers AWS managers recommended). For me, it's been easier and less stressful to run LLMs/embedders/rerankers myself than to fight cloud providers' latencies :)

>then put all of your data centers there

>You definitely don't need a data center in every continent.

Not always possible due to legal reasons. Many jurisdictions already have (or plan to have) strict data processing laws. Also many B2B clients (and government clients too), require all data processing to stay in the country, or at least the region (like EU), or we simply lose the deals. So, for example, we're already required to use data centers in at least 4 continents, just 2 more continents to go (if you don't count Antarctica :)