Comment by CamperBob2
1 month ago
10 tps, maybe, given the Spark's hobbled memory bandwidth. That's too slow, though. That thread is all about training, which is more compute-intensive.
A couple of DGX Stations are more likely to work well for what I have in mind. But at this point, I'd be pleasantly surprised if those ever ship. If they do, they will be more like $200K each than $100K.
I linked results where the user ran Kimi k2 across his 8-node cluster. Inference results are listed for 1,10,100 concurrent requests.
Edit to add:
Yeah, those stations with the GB300 look more along the lines of what I would want as well but I agree, they’re probably way beyond my reach.