Comment by computomatic

7 days ago

This is great product design at its finest.

First of all, they never “handle more requests than they have hardware.” That’s impossible (at least as I’m reading it).

The vast majority of usage is via their web app (and free accounts, at that). The web app defaults to “auto” selecting a model. The algorithm for that selection is hidden information.

As load peaks, they can divert requests to different levels of hardware and less resource hungry models.

Only a very small minority of requests actually specify the model to use.

There are a hundred similar product design hacks they can use to mitigate load. But this seems like the easiest one to implement.

4 comments

computomatic

addaon 6 days ago

> But this seems like the easiest one to implement.

Even easier: Just fail. In my experience the ChatGPT web page fails to display (request? generate?) a response between 5% and 10% of the time, depending on time of day. Too busy? Just ignore your customers. They’ll probably come back and try again, and if not, well, you’re billing them monthly regardless.

nocturnes 6 days ago
Is this a common experience for others? In several years of reasonable ChatGPT use I have only experienced that kind of failure a couple of times.
- aniviacat 6 days ago
  
  I don't usually see responses fail. But what I did see shortly after the GPT-5 release (when servers were likely overloaded) was the model "thinking" for over 8 minutes. It seems like (if you manually select the model) you're simply getting throttled (or put in a queue).
- addaon 4 days ago
  
  > Is this a common experience for others?
  I should think about whether my experience generalizes.
  The user seems to have had a different experience.
  Stopped reasoning.