Comment by refulgentis

11 hours ago

Any thoughts on using it on Fireworks? It's extremely fast there.

1 comment

refulgentis

I'm not sure how many of our requests got routed to Fireworks -- for our testing, we set preferences for routing to providers with the highest advertised quantizations / highest reasoning mode support / or preferably the model developer itself.

While it may be possible to get better numbers from certain providers, we try to establish a common baseline. I.e. if we measure that Kimi K2.6 averages 450s on a task and GLM 5.1 averages 400s, you might be able to improve that number on a provider like Fireworks but GLM 5.1 would also likely be 10% faster on the premium provider. This is a caveat worth considering when comparing to proprietary model speeds on the site, though.