Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library

Comment by aurareturn

19 hours ago

Are you using 16bit for inference? How many tokens/second if you use 8bit?

Given that SOTA models now use 4bit inference, can you do an estimation for 4bit + Blackwell?

1 comment

aurareturn

Reply

mycelia  3 hours ago

Hi! This benchmarking was done w/ DeepSeek-V3's published FP8 weights. And Blackwell performance is still being optimized. SGLang hit 14k/s/B200 though, pretty cool writeup here: https://lmsys.org/blog/2025-09-25-gb200-part-2/

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities