Comment by bavell

4 hours ago

> As a user, I _expect_ the cost of resuming X hours/days later to be no different to resuming seconds or minutes later.

As an informed user who understands his tools, I of course expect large uncached conversations to massively eat into my token budget, since that's how all of the big LLM providers work. I also understand these providers are businesses trying to make money and they aren't going to hold every conversation in their caches indefinitely.

1 comment

bavell

andrewingram 2 hours ago

I'd hazard a guess that there's a large gulf between proportion of users who know as much as you, and the total number using these tools. The fact that a message can perform wildly differently (in either cost, or behaviour if using one of the mitigations) based on whether I send it at t vs t+1 seems like a major UX issue, especially given t is very likely not exposed in the UI.