DeepClaude – Claude Code agent loop with DeepSeek V4 Pro

13 hours ago (github.com)

    #!/bin/sh
    export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
    export ANTHROPIC_AUTH_TOKEN=sk-secret
    export ANTHROPIC_MODEL=deepseek-v4-flash
    export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
    exec claude $@

  • ANTHROPIC_MODEL=deepseek-v4-pro[1m] ANTHROPIC_SUBAGENT_MODEL=deepseek-v4-flash

    This is what I’ve been using for non-confidential projects for about a week now (soon after v4 came out). I honestly can’t tell the difference, but I’m not doing anything crazy with it either.

    Worth noting that I don’t think DeepSeek‘s API lets you opt out of training. Once this is up on other providers though… (OpenRouter is just proxying to DeepSeek atm)

    • For those that don't want their data trained on, OpenRouter allows you to have account-wide or per-request routing with either provider.data_collection: "deny" or zdr: true (zero data retention).

      Also, you can use HuggingFace Inference for DeepSeek V4 or Kimi K2.6, both of which work quite well and route through providers that you can enable/disable (like Together AI, DeepInfra, etc) - you'll have to check their policies but I think most of those commercial inference providers claim to not train on your data either.

      2 replies →

  • The more interesting part of deepclaude is the local proxy it runs to switch models mid-session and do combined cost tracking. Though these features seem quite buried in the LLM-generated readme. Looking at the history, it appears they were added later, and the readme wasn't restructured to highlight this.

    Also, the author checked in their apparently effective social media advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc... (which seems to be working)

  • thanks, that was super easy.

    I have been wanting to try CC with different models since Opus went downhill last month..

    What limitations or issues have you noticed when using DeepSeek with Claude Code if any?

  • those who use deepseek v4, what level of output you get? Codex 5.3 or GPT 5.4?

    is flash version on level of gpt 5.4 mini

    • I tried it on a non trivial, but also well documented and self contained task. It did amazingly well. I used deepseek v4 pro via deepseek platform. The model is very fast and also it is super cheap. I burned only 0.06 USD (I reckon how the same task would have cost me had I used e.g., amp).

      PS. mentioning amp because i used to use it and I pay directly for token. I topped up 5 usd so I will be going to use it and see how far can it take me. But my impression so far is even when model subsidization is done, those open source models are quite viable alternatives.

      7 replies →

I'm not exactly sure what the point of this is. Deepseek already has instructions to use its API with many CLI's including Claude Code directly:

https://api-docs.deepseek.com/quick_start/agent_integrations...

  • The readme absolutely buries the features that are actually non-trivial: It runs a proxy to switch models mid-session, and does combined cost tracking between Anthropic and other models you might be using. The LLM that wrote the readme never updated the general project description to highlight these features.

    Also the author checked in their advertising plan: https://github.com/aattaran/deepclaude/commit/a90a399682defc...

  • There probably isn't a point. Someone didn't understand something, didn't research it, so they 1 shotted their first thought and sent it to the front page of HN and all of their socials. It's the future bruh

    • I embrace it at this point. It ends all the shilling of vibe coded tools at work that I have endured over the past year. Everyone can now make their own tools with zero obligation to coordinate beyond shared hardware resources

  • It's really getting a lot of upvotes so it's nearly as if people were feeling locked-in and wanted a way out but...

    Why would you keep using CC CLI if you want to use the much cheaper DeepSeek v4 models (Flash and Pro): isn't it the opportunity to kiss CC CLI goodbye and use something not controlled by Anthropic?

    Anyone here successfully moved from CC CLI to a fully open-source project? I'm asking this as a Claude Code CLI (Sonnet/Opus) user. My "stack" is all open-source: from Linux to Emacs to what-have-you. I'd rather also have open-weight models and a fully open-source (not controlled by a single company) AI CLI.

    Any suggestion for something that works well? (by "well" I mean "as well as Claude Code CLI", which is not a panacea so my bar ain't the end of the world either).

  • I'm curious how well it actually works. I tried Deepseek with Hermes and Opencode and it seemed extremely bad about using some of the basic tools given, like the Hermes holographic memory tools, even with system prompt instructions strongly pointing them out.

Not sure you can replace Claude with DeepSeek V4 that easily and have same results.

From what I see while building my own agentic system in Elixir, the problem is in training for your specific harness/contracts. Claude/GPT-style models seem to be trained around very specific contracts used by the harness like tool call formats, planning structure, patching, reading files, recovering from errors, and knowing when to stop.

In practice, you either need a very strong general model that can infer and follow those contracts (expensive), or a weaker model that has been fine-tuned / trained specifically on your own agent contracts. Otherwise, the whole thing becomes flaky very quickly. And I suspect with Deepseek V4 you may get last options.

  • There are certainly quirks, but identifying and conforming to those quirks is not that complex. E.g. I had Kimi "fix" my harness to work better with Kimi by pointing it at the (open source) kimi-cli + web search and telling it to figure out which differences might matter (it made compaction more aggressive, and worked around some known looping issues (by triggering compaction if it spotted looping tool calls). Largely addressing the quirks tend to harden the harness for other models too. But, yeah, it is more work to make the smaller models work with instead of against the harness.

  • I hope they collaborate with open source harness providers (Pi, Opencode) and train models with those. So next generations will have better integration and better overall quality.

  • Idk, my recent experience with Claude is that 4.7 barely knows how to use basic bash tools - how to properly check when programs have finished running, even basic stuff like how to run pytest suites and read the failed tests from the output without re-running the suite to specifically look for them. It's shockingly dumb for all of the tooling they've built into Claude Code (the useless Monitoring tool that blocks bash polling/sleeping that actually works, etc.).

    I finally get fed up and started using GPT 5.5 the past 4 days and its a breath a fresh air despite feeling much more minimal. With Claude I had to write so many hooks to enforce behaviors it wouldn't remember and it lacked common sense on. GPT 5.5 does a much better job with things like knowing the AWS CDK CLI can hang on long CloudFormation deployments and it should actively check the deployment status using CloudFormation API rather than hanging for 30+ minutes - and it does this all without asking.

    Maybe there's better tooling built into Codex too, but at least on the surface level it seems like how smart the model is makes a significant difference because Claude has more tools than I can count and still struggles to use "grep".

    Edit: Like just now - I can't tell you how many times I day I see this sequence:

    "Sorry, I'll run in parallel"

    "Error editing file"

    "File must be read first"

    Repeat 10x for the 10 subagents Claude spawned and then it gets stuck until you press escape and it says "You rejected the parallel agents. Running directly now"

  • I love to learn more about the system you’re building out in Elixir and your learnings if any of it is public.

>DeepSeek V4 Pro scores 96.4% on LiveCodeBench and costs $0.87/M output tokens

This is a heavily subsidized price and will only last until the end of the month: "The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC." [0]

The "supported backends" table is also deceiving -- while OpenRouter's server's may be in the US, the only way to get the $0.44/$0.87 pricing is to pass through to the DeepSeek API, which of course is China-based. [1]

I do think the model is quite good, I myself use it through Ollama Cloud for simple tasks. But I think some folks have bought in a little too much to the marketing hype around it.

[0] https://api-docs.deepseek.com/quick_start/pricing [1] https://openrouter.ai/deepseek/deepseek-v4-pro/providers

  • They expect inference prices to structurally drop once they receive their big batch of Huawei Ascend chips by the second half of the year.

If you're looking for Claude Code alternatives, I would first suggest looking into pi.dev or opencode for your harness. And then for models, you can choose from OpenCode Go (IMO most cost effect at this moment), OpenRouter, or direct from DeepSeek. Better if you go the Kimi route IMO and just buy a subscription from kimi.com

  • I liked pi.dev but why is registering endpoints and models not as simple as possible ? Or am i missing something ? I always have to fiddle with the config file.

    • Editing config files is not necessary. Just do /login from your session, choose your provider, and there you go.

  • agreed. OpenCode is a strong base, and with a couple modifications it can become a very effective harness. my sideproject mouse.dev I’ve been combining parts from OpenCode, Claude Code, and Hermes to build a cloud agent architecture that works well from mobile.

    • > OpenCode is a strong base, and with a couple modifications it can become a very effective harness.

      I personally didn't find it to be competitve with Claude Code as a harness. Can I ask how you modified it to perform better?

      1 reply →

    • I'm a Claude Code Web fan and a rather heavy user. So I was interested in your product. However, I couldn't find an answer on the website. What parts did you find so good that you ported them?

      1 reply →

  • Another very cost-effective option is Ollama Cloud. In a month of use, I only hit the 5-hour limit once, when I ran 8 agents simultaneously for 2 hours.

  • How does the kimi subscription compare to Codex and Claude Code in terms of how much mileage you get for the pricing? I mean, I see the prices but not sure how usage that buys.

  • > I would first suggest looking into pi.dev

    Looked into this one. Thought it was suspicious that it only had 7 open issues on github. Turns out they have a bot that auto-closes every single issue just because.

    I honestly have no words.

  • >If you're looking for Claude Code alternatives, I would first suggest looking into pi.dev or opencode for your harness.

    While those are nice, Claude Code has the largest amount of plugins and skills I want to use.

> Claude Code is the best autonomous coding agent.

If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.

So it's quite the opposite. Claude Code is arguably the worst harness to run models with.

It's surprisingly easy to hit $200 worth of tokens even at ~$1/M token though. No matter how many times I do the math the coding plans are the better value.

Just want to say that I faced this very problem the last week, I discovered OpenCode agent and it works great, with DeepSeek and other models. Try it out guys.

  • Pi will blow your mind :)

    • No MCP.

      No sub-agents. There's many ways to do this. Spawn Pi instances via tmux, or build your own with extensions, or install a package that does it your way.

      No permission popups. Run in a container, or build your own confirmation flow with extensions inline with your environment and security requirements.

      No plan mode. Write plans to files, or build it with extensions, or install a package.

      No built-in to-dos. Use a TODO.md file, or build your own with extensions.

      No background bash. Use tmux. Full observability, direct interaction.

If you're okay with sonnet level performance, this sounds like a straight upgrade. But I find that sonnet messes up too much, that it ends up not being worth cost optimizing down to using it or another sonnet-level model. Glad to have this as an option though

  • A lot of people are having good experiences doing things like using opus for designing and using locally hosted qwen3.6 for implementation.

    I could see a serious cost reduction story by using opus for design and deepseek for implementation.

    Personally I would avoid anthropic entirely. But I get why people don't.

    • Like me: that’s what I do. Either Opus 4.7 or GLM 5.1 for planning, write it out to a markdown file, then farm it out to Qwen 3.6 27B on my DGX Spark-alike using Pi. Works amusingly well all things considered.

      13 replies →

  • I keep re-learning this lesson: I chug along with a lesser model then throw a problem at it that's too complex. Then I try different models until I give up and bring in Opus 4.6 to clean up.

    • It's not even that much cheaper, GPT 5.5 is about 2x more expensive per task than Deepseek v4 Pro when you adjust for less token usage, according to Artificial Analysis. Doesn't seem worth it to me.

      2 replies →

  • This is the problem: you need the best model, not just a good one, for: - Good architecture, which requires reading specs, code, etc. reads like: lots of tokens in/out - Bug fixing — same, plus logs, e.g. datadog

    Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.

    testing gets more and more complicated. Take a look at opencode go, and you see this:

    >Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash

    and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?

  • We're not yet at a point of saturation when all the frontier models would be of somewhat comparable "intelligence" and we could decide which to use based on other factors (speed, effective context window etc.), so I honestly don't see why would you (as a company or an employee) not use the best available model with the highest (or at least second highest) thinking effort. The fees are not exactly cheap, but not that expensive either.

    • Agreed that we're not at saturation, but we don't have a canonical "best" either. For example ChatGPT 5.5 + Codex is, in my experience, vastly superior to Opus 4.7 + Claude Code at sufficiently well-specified Haskell, but equally vastly inferior at correctly inferring my intent. Deepseek may well have its own niche, though I haven't used it enough to guess what it might be.

  • This has been my experience working on tsz.dev. Only Opus 4.7 and GPT 5.5 can really be productive for the remaining test cases.

  • I don’t find this with sonnet at all. As long as I have a solid Claude.md and periodically review the output and enforce good code practices via basic CI gates I’ve rarely ever found myself having to switch to opus

Next claude news (trump style): Recent versions of Claude code no longer allow talking to other models, or helping with any code that has the goal of moving away from anthropic models.

I've been using DeepSeek v4 pro as an alternative to Claude models and for the first time I can see it as a real replacement. With the other Chinese models, I was missing something, but DeepSeek seems good enough for the kind of development I want to do.

Is claude code the best coding harness? Anyone running evals on that?

  • In my anecdotal experience, it is not. Same model, opus, works better in 3P harnesses such as Factory Droid or Amp.

    Claude code, on the other hand, is the most subsidized one, both for consumers (through max subscription) and for enterprises (token discounts). It is also heavily optimized for cost, specially token caching and reduced thinking, at the expense of quality.

  • Terminal Bench is testing agent harness.

    The best two are Codex and Forge Code.

    However I am using plugins and skills that are only compatible with Claude Code or work best with Claude Code.

    So, for me, Claude Code with plugins like claude-meme, Context Mode, Superpowers and Get Shit Done is better than other tools.

    I think everyone should test multiple models and multiple agent harness for his specific needs, codebase and way of working.

obviously vibe coded ( co authored ) + the prices dont even match

  • It's going to be real hard to find headlines that weren't vibe coded from here on out unfortunately.

    • Unless I actually know the author I assume everything here is vibeslop and full of mistakes.

      Maybe I need to switch to some news publication that actually does real research and writing still. Because public forums like this have been completely destroyed by LLMs.

    • welp, pack it it in boys, it was nice conceptualizing all you as real humans on the internet. I guess I'll just have to go touch grass if I want to feel parasocial.

      4 replies →

Interesting setup

do you have any benchmarks on: - token usage over time - failures/retry rates

would be great to see how it behaves in production

Is there a way to do this directly by using claudecode CLI (which I already have installed) and openrouter??

I just spent half my day getting CUDA and LLAMA to work with my 5070TI.

I was able to use it in agent mode with Roo, I stopped after having it write out a plan, but I'll continue when I have more time.

Deepseek feels less likely to do a straight up rug pull since you can self host with enough money, but I'm still more excited about local solutions.

Usually I just need grunt work done. I'm not solving difficult problems.

Using a bunch of CLIs to work with DeepSeek V4, I've found that Langcli is the best fit for DeepSeek V4. For programming tasks, the cache hit rate is above 95%.

Not only can it seamlessly and dynamically switch between DeepSeek V4 Flash, V4 Pro, and other mainstream models within the same context, but it is also 100% compatible with Claude Code.

This has become a problem for me. I like trying new things. But I also know that in about a week, there's going to be a better/cheaper setup. And a week after that. And ideally I'd like to get some coding done when I'm not tinkering with the tools.

So I think I'll stay with CC for now.

  • CC has the ability to use Ollama as well, which includes the ability for Ollama to proxy to Ollama's cloud models. It's brilliant, and works with a single Ollama command that doesn't mess with CC at all (so you can run them at the same time).

    If you are interested, I've built an agentic terminal that helps manage these types of things better: https://deepbluedynamics.com/hyperia

You don't need Deep Claude. Claude Code is working with any model that exposes an endpoint for an Anthropic compatible API.

I am using Claude Code with GLM 5.1, MiniMax M2.7, Kimi K2.6 and Xiaomi MiMo V2.5 Pro.

Honestly with the likes of Opencode / pi / hermes I don't really find the "Claude Code agent loop" part particularly interesting.

The edge Anthropic has on others lies on its models performance. CLI tooling (and obviously pricing) is definitely not better than others.

Nice, it's quite usefull to have a project like this which streamlines the setup necessary to use other "brains" in claude code "body". I personally will give this a try, but Ijust find the message on pricing a bit disingenuous, the deepseek price of "$0.87/M output tokens" is a discount, and this setup anyways needs a calude.ai subscription offering claude code, which now is 100$/month min.

the wrapper is basically env var glue. You’re still betting the whole loop on Anthropic's closed client.

Can I... somehow run this locally? DeepSeek is opensource? Do I even need their API key?

(I have no experience with running anything locally, maybe it's a stupid question)

  • Waiting for official support in llama.cpp. There is a fork that can run a lightly quantized (Q2 expert layers) DeepSeek V4 Flash in 128GB RAM without offloading weight fetches from disk.

Why wouldn't you use something open source like OpenCode, which already support DSv4 and has more features than CC?

  • Coding harnesses make a big difference, and OpenCode is notably less effective than Claude Code (1) in my experience, (2) with the models I've tried it on. (I've not yet tried it with DSv4.)

  • As someone who does use other models with CC, I am curious about opencode, what extra features does it have that you find essential?

    • I like being able to add a wide array of models, define perms for agents and subagents, turn MCPs on and off at will, and be able to fix bugs I find in it.

      2 replies →

  • If using something open source, I'd say Forge Code has better results than Open Code, at least according to Terminal Bench.

  • More features than CC ?

    Also opencode tracks you by default. Its not safe. Every first prompt you send is routed through their servers, logged and they can use your data however they want

anthropic messed up big time harness works with any muh commodity LLM, meanwhile VCs were duped on the myth of FOOM AGI, probably not a cooincidence Anthropic is enmeshed with the scifi fan fic forum known as lesswrong. The world wants useful tools. The bay area bubble in contrast thrives on Mythos.

  • I think OAI and Anthropic will be ok for a year or two. But after that If they still continue to earn revenues from selling tokens to firms/software engineers they will be in serious trouble.

    The American firms are not demonstrating escape velocity and as long as china offers something somewhat comparable and offers it at a very low price to compensate for any difference in quality, they will not be generating enough in cash flows to finance reinvestment. I highly doubt they’ll be able to continue raising external financing for numerous periods from here on out - they gotta start showing strong financials and that they are running away from the open source models.

    • The performance gap will likely close as Chinese hardware improves. This is happening very rapidly.

      Already DeepSeek v4 is being hosted on Huawei Ascend 950. What do you think those cost relative to NVIDIA gear?

    • I wouldnt put it past the US gov to ban foreign models. they tried to ban tiktok. what is being demosrrated here is silicon valley can not withstand a competitive market.

      2 replies →

  • > anthropic messed up big time harness works with any muh commodity LLM

    that surprised me too. The intelligence is at the client, and by making that open, anthropic has commoditized the coding agent.

I had a call with our CTO and we are pivoting away from Claude Code to DeepClaude because the cost savings are too substantial to ignore.