Comment by danielhanchen
4 days ago
Oh it's more time that's the issue - each benchmark takes 1-3 hours ish to run on 8 GPUs, so running on all quants per model release can be quite painful.
Assume AWS spot say $20/hr B200 for 8 GPUs, then $20 ish per quant, so assuming benchmark is on BF16, 8bit, 6, 5, 4, 3, 2 bits then 7 ish tests so $140 per model ish to $420 ish/hr. Time wise 7 hours to 1 day ish.
We could run them after a model release which might work as well.
This is also on 1 benchmark.
No comments yet
Contribute on Hacker News ↗