← Back to context

Comment by refulgentis

11 hours ago

Any thoughts on using it on Fireworks? It's extremely fast there.

I'm not sure how many of our requests got routed to Fireworks -- for our testing, we set preferences for routing to providers with the highest advertised quantizations / highest reasoning mode support / or preferably the model developer itself.

While it may be possible to get better numbers from certain providers, we try to establish a common baseline. I.e. if we measure that Kimi K2.6 averages 450s on a task and GLM 5.1 averages 400s, you might be able to improve that number on a provider like Fireworks but GLM 5.1 would also likely be 10% faster on the premium provider. This is a caveat worth considering when comparing to proprietary model speeds on the site, though.