Comment by wongarsu 11 days ago Which conveniently fits on one 8xH100 machine. With 100-200 GB left over for overhead, kv-cache, etc. 2 comments wongarsu Reply storystarling 11 days ago The unit economics seem pretty rough though. You're locking up 8xH100s for the compute of ~32B active parameters. I guess memory is the bottleneck but hard to see how the margins work on that. kristianp 10 days ago Yes, it only makes sense economically if you have batching over many users.
storystarling 11 days ago The unit economics seem pretty rough though. You're locking up 8xH100s for the compute of ~32B active parameters. I guess memory is the bottleneck but hard to see how the margins work on that. kristianp 10 days ago Yes, it only makes sense economically if you have batching over many users.
The unit economics seem pretty rough though. You're locking up 8xH100s for the compute of ~32B active parameters. I guess memory is the bottleneck but hard to see how the margins work on that.
Yes, it only makes sense economically if you have batching over many users.