Comment by glenngillen

4 days ago

We do cache the results locally so that we're not repeatedly hitting our pricing API. The LLM doesn't access that cache directly though as it'd suffer the token tax you mention. Instead we optimised our CLI to return agent optimised results. We're constantly iterating and improving on it, but it already reduces the tokens usage very significantly. I wrote about it here: https://www.infracost.io/resources/blog/we-cut-claude-s-toke...

We've found even more improvements since that post so those will be shipping soon too.

2 comments

glenngillen

eugeneonai 3 days ago

Great, will it be possible to see it in your profile?

glenngillen 3 days ago

I'm not sure I follow, which profile do you mean? My profile on HN?
I don't know if we'll keep dissecting every incremental improvement we make as (so far) the general approach is the same as documented in the existing blog post: document common use cases -> benchmark them -> identify bottlenecks/expensive hot spots -> fix them -> repeat
The main thing changing right now is observing new more frequent use cases (either because we're adding new capabilities, or users are doing things we didn't entirely predict) and adding them to the test cases.