Comment by brookst
4 days ago
Yes. It is not incredibly rare, it's incredibly common. A huge percentage of queries to retail LLMs are things like "hello" and "what can you do", with static system prompts that make the total context identical.
It's worth maybe a 3% reduction in GPU usage. So call it a half billion dollars a year or so, for a medium to large service.
So if 3% is 500M, then annual spend is ~16.6B. That is medium sized these days?