Comment by stopachka

20 hours ago

I don't quite understand, what would 100K buy you?

AFAIK you would get about ~5 concurrent users, with a max context window of ~128K tokens on the larger models.

This wouldn't be good enough for coding -- are you guys thinking of using it for something else?

Gigabyte 4x AMD Instinct MI300A rack server (512GB GPU RAM total)

Roughly equivalent to 4x H200's for less than half the price.

Vaguely around 60k tokens per second...

By my calculations 100k could get you 18 5090's + compute to host them, or 18 96gb Mac mini's. You can get a lot of context window and users out of that setup.