Comment by reasonableklout

20 days ago

"ChatGPT needs an array of GPUs per active user" - nit: you're exaggerating by a few orders of magnitude.

First, queries from users can be combined and fed into servers in batches so that hundreds of queries can be concurrently served by a single node. Second, people aren't on and asking ChatGPT questions every second of every day. I'd guess the median is more like ~single digit queries per day. Assuming average response length of 100 tokens and throughput of 50 tok/s at batch size 50, that's 25 QPS or 2.1M queries per day, or 420k users served per node at 5 queries per user per day.

Now, a single 8xH100 node is a lot more expensive than $5/mo, so you're directionally correct there, but I'd wager you can segment your market aggressively and serve heavily distilled/quantized models (small enough to fit onto single commodity GPUs, or even CPUs) to your free tier. Finally, this is subject to Huang's Law, which says every 2 years the cost of the same performance will more than halve.

0 comments

reasonableklout

No comments yet

Contribute on Hacker News ↗