Comment by nottorp

1 day ago

If you work at Google or whatever else is popular or monopolistic this week.

In most real jobs those ten milliseconds will add up to what, 5 seconds to a minute?

There is probably a non-linear function of how slow your software is to how many users will put-up with it.

Those 10 ms may quite well mean the difference between success and failure... or they may be completely irrelevant. I don't know if this is knowable.

  • There is. But what the OP is doing is not that, it's "scaling". Which probably makes sense for whatever they're working on*. For the other 99% of projects, it doesn't.

    * ... if they're at ClosedAI or Facebook or something. If they're at some startup selling "AI" solutions that has 10 customers, it may be wishful thinking that they'll reach ClosedAI levels of usage.

    • It's not really clear to me that the OP is talking about hardware costs. If so, yeah, once you have enough scale and with a read-only service like an LLM, those are perfectly linear.

      If it's about saving the users time, it's very non-linear. And if it's not a scalable read-only service, the costs will be very non-linear too.