Comment by marcosdumay
2 days ago
There is probably a non-linear function of how slow your software is to how many users will put-up with it.
Those 10 ms may quite well mean the difference between success and failure... or they may be completely irrelevant. I don't know if this is knowable.
There is. But what the OP is doing is not that, it's "scaling". Which probably makes sense for whatever they're working on*. For the other 99% of projects, it doesn't.
* ... if they're at ClosedAI or Facebook or something. If they're at some startup selling "AI" solutions that has 10 customers, it may be wishful thinking that they'll reach ClosedAI levels of usage.
It's not really clear to me that the OP is talking about hardware costs. If so, yeah, once you have enough scale and with a read-only service like an LLM, those are perfectly linear.
If it's about saving the users time, it's very non-linear. And if it's not a scalable read-only service, the costs will be very non-linear too.