← Back to context

Comment by CSSer

19 hours ago

Can you speak a little more to this? I'm curious what kind of parameters one must consider/monitor and what kind of novel things could go wrong.

My guesses are:

hardware capacity constraints is going to be the big one

Effective caching is another, I bet if you start hitting cold caches the whole things going to degrade rapidly.

The ground is probably shifting pretty rapidly.

Power users are trying to get the most out of their subscriptions and so are hammering you as fast as they possibly can. See Ralph loops.

Harnesses are evolving pretty rapidly, as well as new alternatives harnesses. Makes the load patterns less predictable, harder to cache.

The demand is increasing both from more customers, but also from each user as they figure out more effective workflows.

Users are pretty sensitive to model quality changes. You probably want smart routing, but users want the best model all the time.

Models keep getting bigger and bigger.

On top of that they are probably hiring more onboarding more, system complexity and codebase complexity is growing.