Comment by embedding-shape
4 hours ago
I'm not sure you need a "DeepSeek native coding agent" to take advantage of DeepSeeks cache, yesterday as the Codex quota usage issue still wasn't solved for me, I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex, and seems most of everything I did was basically cached as far as I can tell: https://i.imgur.com/7eKn6wN.png (2026-05-23 Input (Cache hit): 39,123,200 tokens, Input (Cache miss) 1,692,286), and the bridge is doing not special, just massage the DeepSeek API shape into what Codex expects, nothing particular about caching at all.
Besides being even better at the caching, I'm not sure what benefits you'd get compared to just firing up OpenCode with the DeepSeek API yourself, it'll similarly do caching for sure and also "talks directly to api.deepseek.com" if that matters, and you'll get a much more mature harness.
Yep exactly my thoughts, went and looked at the code for the deepseek provider in my coding agent. and basically all of what the author wrote there is implemented... http://github.com/tontinton/maki for the curios
Opencode has really bad cache stability issues that they seem uninterested in fixing at the moment.
The OpenCode devs talk about this on Twitter a lot, e.g. https://xcancel.com/thdxr/status/2048268697790300343
> tool call pruning breaks cache and people will tell you this is horrible and expensive
> except i looked at some anthropic data and real user behavior ends up with better cache hits and 30% less spend
> even this is needs to be analyzed further, it's just not simple
> for openai data it's inverted! cache hit ratio is actually better [sic: I think he meant worse based on the screenshot] with tool call pruning turned on
> but the net $ saved is only 5%
> kimi is a funny one - it has better cache hits with pruning on...but is also more expensive!
There was also another thread recently where he discussed that pruning improves user experience (models are smarter with less context) but I can't find it.
This can also be disabled in the config: https://opencode.ai/docs/config/#compaction
I can't confirm this. Having utilized Opencode for a large project over the past 10 months, with multiple models and agents, we've never run into such 'cache stability issues'."
That'd be really easy to spot and also fix, most likely. Any open issue you could point us to, must surely been reported already?
> That'd be really easy to spot and also fix, most likely
Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."
3 replies →
Opencode (and other coding agents) have hundreds of open issues reported. It is quite discouraging when they are not being closed/fixed.
1 reply →
[flagged]
This would be a better page to link to https://github.com/esengine/DeepSeek-Reasonix/blob/main/docs...
They explain some of the the reasons why they have a better solution and why they are very opinionated
>Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: <20%.
So they optimize on this plus other techniques to improve cache hits, making it cheaper.
> I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex
Can you share the bridge. DeepSeek v4 is awesome paired with claude-code or opencode. I found that claude code costs me less than opencode and I am presuming this is due to a better engineered harness.
Sure, keep in mind it's a steaming pile of hacked together hacks, probably won't work in every case, doesn't support every feature that should be supported (like parallel tool calling, both Codex + DeepSeek API support it), and it might make your computer catch on fire: https://gist.github.com/embedding-shapes/eab3e63e5a95d3d78a2...
I only used it for a few hours to play around with stuff before the quota issue was fixed and I could resume using GPT models, and the bridge was coded by DeepSeek-V4-Flash-IQ2XXS + DwarfStar4 locally, I take no responsibility for what might happen with your computer or you, during usage or just reading the code.
Edit: heh, like don't look at line 117 for example where seemingly it likes to handle misspellings in the .env file which totally wasn't my fault for typo'ing the API key in that file... I'm sure there are tons of sharp edges and dumb stuff in there.
LiteLLM can serve OpenAI API endpoint IIRC and proxy that to other providers like DeepSeek, should work with Codex
I’m feeling more a novice every day, but how isn’t this just handing over your code to team deepseek for whatever they might want
Not everyone is working with state secrets or user personal data (or even more closely guarded, company secrets) on a daily basis, most of what I hack on is either FOSS already, or will be, not much to keep secret here.
Obviously, if you do deal with any sort of secrets, then using local LLMs over OpenAI, Anthropic, DeepSeek or whoever is obviously preferred, and in the case of personal data of users, probably a requirement.
1 reply →
You’re not a novice, there are a lot of us who know exactly what we are doing and see this as a huge downside. We are just being told to go faster, faster, faster lest we miss out on… something?
there's laws on the books in China that says that every company operating in China must aid and abet the Chinese government in espionage against the rest of the world. given those facts, I find it deeply troubling to be using anything coming out of China, especially a program that runs in the context of a Linux terminal on a machine that might have something important on it. I'd argue it's a back door waiting to happen, if not sooner than obviously later.
this appears to be native to the terminal, as in, there's no special application that runs or wraps an agent inside a tui. So basically instead of commands you type plain english?
> this appears to be native to the terminal, as in, there's no special application that runs or wraps an agent inside a tui
Same with codex? codex-rs at least, is a TUI as well, it does run a "app-server" in the background, that the TUI actually interacts with, but that's just an implementation detail. Also makes it easy to hook in your own programs to fire of codex "headless" sessions even without the TUI.