Comment by nottorp
1 day ago
If you work at Google or whatever else is popular or monopolistic this week.
In most real jobs those ten milliseconds will add up to what, 5 seconds to a minute?
1 day ago
If you work at Google or whatever else is popular or monopolistic this week.
In most real jobs those ten milliseconds will add up to what, 5 seconds to a minute?
There is probably a non-linear function of how slow your software is to how many users will put-up with it.
Those 10 ms may quite well mean the difference between success and failure... or they may be completely irrelevant. I don't know if this is knowable.
There is. But what the OP is doing is not that, it's "scaling". Which probably makes sense for whatever they're working on*. For the other 99% of projects, it doesn't.
* ... if they're at ClosedAI or Facebook or something. If they're at some startup selling "AI" solutions that has 10 customers, it may be wishful thinking that they'll reach ClosedAI levels of usage.
It's not really clear to me that the OP is talking about hardware costs. If so, yeah, once you have enough scale and with a read-only service like an LLM, those are perfectly linear.
If it's about saving the users time, it's very non-linear. And if it's not a scalable read-only service, the costs will be very non-linear too.