Comment by brookst

4 days ago

Yes. It is not incredibly rare, it's incredibly common. A huge percentage of queries to retail LLMs are things like "hello" and "what can you do", with static system prompts that make the total context identical.

It's worth maybe a 3% reduction in GPU usage. So call it a half billion dollars a year or so, for a medium to large service.

    > It's worth maybe a 3% reduction in GPU usage. So call it a half billion dollars a year or so, for a medium to large service.

So if 3% is 500M, then annual spend is ~16.6B. That is medium sized these days?