Comment by afr0ck

7 days ago

Inference runs like a stateless web server. If you have 50K or 100K machines, each with a tons of GPUs (usually 8 GPUs per node), then you end up with a massive GPU infrastructure that can run hundreds of thousands, if not millions, of inference instances. They use something like Kubernetes on top for scheduling, scaling and spinning up instances as needed.

For storage, they also have massive amount of hard disks and SSD behind planet scale object file systems (like AWS's S3 or Tectonic at Meta or MinIO in prem) all connected by massive amount of switches and routers of varying capacity.

So in the end, it's just the good old Cloud, but also with GPUs.

Btw, OpenAI's infrastructure is provided and managed by Microsoft Azure.

And, yes, all of this requires billions of dollars to build and operate.