Comment by ycui7
10 hours ago
You can get 120TPS (144 peak) with Qwen3.6-27B on RTX PRO 6000 with autoround when MTP enabled. It runs faster than sonnet api calls.
5090 gets maybe 100TPS with MTP
10 hours ago
You can get 120TPS (144 peak) with Qwen3.6-27B on RTX PRO 6000 with autoround when MTP enabled. It runs faster than sonnet api calls.
5090 gets maybe 100TPS with MTP
No comments yet
Contribute on Hacker News ↗