Comment by vikramkr

6 days ago

In addition to stuff like that they also handle it with rate limits, that message that Claude would throw almost all the time when they were like "demand is high so you have automatically switched to concise mode", making batch inference cheaper for API customers to convince them to use that instead of real time replies. The site erroring out during a period of high demand also works, prioritizing business customers during a rollout, the service degrading. It's not like any provider has a track record for effortlessly keeping responsiveness super high. Usually it's more the opposite.