Comment by morpheos137
10 hours ago
Yes Deepseek V4 is as good or better than western sota models in my experience for practical coding given an appropriate harness. cost per solution is certainly cheaper.
10 hours ago
Yes Deepseek V4 is as good or better than western sota models in my experience for practical coding given an appropriate harness. cost per solution is certainly cheaper.
Interesting. Can you elaborate on which harness you've tried it with? I'd love to switch to deepseek for my personal use.
Also, which SOTA western models are you comparing it with? Just to give more flavor.
My personal observation (using a mix of opencode and pi harness):
1. DS4Pro: around opus 4.5
2. DS4Flash: around sonnet 4
3. Mimo v2.5 pro: between opus 4.5 and opus 4.6.
4. minimax M3: around opus 4.6
All of these are very close in terms of quality and pricing. For anything that is not specifically related to coding, DS4Flash has become ny de-factor model. It just works... super fast, tool calling is perfect, and the price is unbeatable. Caching is out of the world. Im now regularly hitting 90%+.
i have been using deepseek-v4-flash since it came out. i use a highly structured harness and spec/test driven workflow running through opencode, and so far there has been nothing it can't do.
i have run through a bunch of tests: re-writing vvenc with assembly kernels, creating the first generation agent harness integration with opencode, porting TS npm modules to C++, porting an entire TS server app to C++, creating a new pure io_uring http server with zero-copy (325K RPS single core), creating a second generation agent from the ground up in C++, setting up a dev environment for custom kernel development on tenstorrent accelerators using tt-metal and ttsim.
i consistently get 98.5% input cache hit ratio. i do see noticeable degradation in performance in the 400-500K context range, so i always try to wrap up sessions by 500K max.
a non-intuitive thing is that the model is very good at low-level systems engineering. i suspect this is because they are internally using it to port their stack to huawei hardware. it can churn out exceptionally complex low level C++ stuff that blows your mind, and then completely choke and run in circles on other seemingly simple tasks.
i only use flash and not pro because i want my tooling to be portable to open weights models that are practical to run. i use deepseek platform and not the open weights models for development, because it is subsidized, and based on observation, i think it is highly likely that they are running some proprietary features on the platform which are not in the open weights model.
it will be very interesting to see what their next point release looks like. the compounding effect of optimizing inference cost and then feeding back inference into training should lead to rapid and accelerating improvement, but only time will tell.
3 replies →
I always feel GPT5.5 is better at ‘getting the bigger picture‘ when I am describing something vaguely vs Chinese models. What’s your experience with that?
1 reply →