DeepSeek-v3.1

2 days ago (api-docs.deepseek.com)

266 comments

wertyk

For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.

./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"

More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1

diggan 1 day ago
> More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1
Was that document almost exclusively written with LLMs? I looked at it last night (~8 hours ago) and it was riddled with mistakes, most egregious was that the "Run with Ollama" section had instructions for how to install Ollama, but then the shell commands were actually running llama.cpp, a mistake probably no human would make.
Do you have any plans on disclosing how much of these docs are written by humans vs not?
Regardless, thanks for the continued release of quants and weights :)
- danielhanchen 1 day ago
  
  Oh hey sorry the docs are still in construction! Are you referring to merging GGUFs to Ollama - it should work fine! Ie:
``` ./llama.cpp/llama-gguf-split --merge \ DeepSeek-V3.1-GGUF/DeepSeek-V3.1-UD-Q2_K_XL/DeepSeek-V3.1-UD-Q2_K_XL-00001-of-00006.gguf \ merged_file.gguf ```
Ollama can only allow merged GGUFs (not splitted ones), so hence the command.
All docs are made by humans (primarily my brother and me), just sometimes there might be some typos (sorry in advance)
I'm also uploading Ollama compatible versions directly so ollama run can work (it'll take a few more hours)
- wfn 6 hours ago
  
  > but then the shell commands were actually running llama.cpp, a mistake probably no human would make.
  But in the docs I see things like
  cp llama.cpp/build/bin/llama-* llama.cpp
  Wouldn't this explain that? (Didn't look too deep)
pshirshov 2 days ago
By the way, I'm wondering why unsloth (a goddamn python library) tries to run apt-get with sudo (and fails on my nixos). Like how tf we are supposed to use that?
- danielhanchen 2 days ago
  
  Oh hey I'm assuming this is for conversion to GGUF after a finetune? If you need to quantize to GGUF Q4_K_M, we have to compile llama.cpp, hence apt-get and compiling llama.cpp within a Python shell.
  There is a way to convert to Q8_0, BF16, F16 without compiling llama.cpp, and it's enabled if you use `FastModel` and not on `FastLanguageModel`
  Essentially I try to do `sudo apt-get` if it fails then `apt-get` and if all fails, it just fails. We need `build-essential cmake curl libcurl4-openssl-dev`
  See https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_z...
  
  65 replies →
- exe34 2 days ago
  
  hey fellow crazy person! slight tangent: one thing that helps keep me grounded with "LLMs are doing much more than regurgitation" is watching them try to get things to work on nixos - and hitting every rake on the way to hell!
  nixos is such a great way to expose code doing things it shouldn't be doing.
  
  3 replies →
zargon 2 days ago
Thanks for your great work with quants. I would really appreciate UD GGUFs for V3.1-Base (and even more so, GLM-4.5-Base + Air-Base).
- danielhanchen 2 days ago
  
  Thanks! Oh base models? Interesting since I normally do only Instruct models - I can take a look though!
azinman2 1 day ago
It’d also be great if you guys could do a fine tune to run on an 8x80G A/H100. These H200/B200 configs are harder to come by (and much more expensive).
- danielhanchen 1 day ago
  
  Unsloth should work on any GPU setup all the way until the old Tesla T4s and the newer B200s :) We're working on a faster and better multi GPU version, but using accelerate / torchrun manually + Unsloth should work out of the box!
  
  1 reply →
efilife 2 days ago
>250GB, how do you guys run this stuff?
- danielhanchen 2 days ago
  
  I'm working on sub 165GB ones!
  165GB will need a 24GB GPU + 141GB of RAM for reasonably fast inference or a Mac
tw1984 2 days ago
for such dynamic 2bit, is there any benchmark results showing how many performance I would give up compared to the original model? thanks.
- danielhanchen 2 days ago
  
  Currently no, but I'm running them! Some people on the aider discord are running some benchmarks!
  
  1 reply →
- segmondy 1 day ago
  
  if you are running a 2bit quant, you are not giving up performance but gaining 100% performance since the alternative is usually 0%. Smaller quants are for folks who won't be able to run anything at all, so you run the largest you can run relative to your hardware. I for instance often ran Q3_K_L, I don't think of how much performance I'm giving up, but rather how without Q3, I won't be able to run it at all. With that said, for R1, I did some tests against 2 public interfaces and my local Q3 crushed them. The problem with a lot of model providers is we can never be sure what they are serving up and could take shortcuts to maximize profit.
  
  5 replies →

hodgehog11 2 days ago

For reference, here is the terminal-bench leaderboard:

https://www.tbench.ai/leaderboard

Looks like it doesn't get close to GPT-5, Claude 4, or GLM-4.5, but still does reasonably well compared to other open weight models. Benchmarks are rarely the full story though, so time will tell how good it is in practice.

segmondy 2 days ago
garbage benchmark, inconsistent mix of "agent tools" and models. if you wanted to present a meaningful benchmark, the agent tools will stay the same and then we can really compare the models.
there are plenty of other benchmarks that disagree with these, with that said. from my experience most of these benchmarks are trash. use the model yourself, apply your own set of problems and see how well it fairs.
- paradite 2 days ago
  
  Hey. I like your roast on benchmarks.
  I also publish my own evals on new models (using coding tasks that I curated myself, without tools, rated by human with rubrics). Would love you to check out and give your thoughts:
  Example recent one on GPT-5:
  https://eval.16x.engineer/blog/gpt-5-coding-evaluation-under...
  All results:
  https://eval.16x.engineer/evals/coding
- jstummbillig 1 day ago
  
  Which benchmarks are not garbage?
  I don't consider myself super special. I think it should be doable to create a benchmark that beats me having to test every single new model.
guluarte 2 days ago
tbh companies like anthopic, openai, create custom agents for specific benchmarks
- bazmattaz 2 days ago
  
  Do you have a source for this? I’m intrigued
  
  1 reply →
- amelius 2 days ago
  
  Aren't good benchmarks supposed to be secret?
  
  9 replies →
YetAnotherNick 2 days ago

Depends on the agent. Rank 5 and 15 are claude 4 sonnet, and this stands close to 15th.
coliveira 2 days ago
My personal experience is that it produces high quality results.
- amrrs 2 days ago
  
  Any example or prompt you use to make this statment?
  
  6 replies →
- SV_BubbleTime 2 days ago
  
  Vine is about the only benchmark I think is real.
  We made objective systems turn out subjective answers… why the shit would anyone think objective tests would be able to grade them?
seunosewa 2 days ago
The DeepSeek R1 in that list is the old model that's been replaced. Update: Understood.
- yorwba 2 days ago
  
  Yes, and 31.3% is given in the announcement as the performance of the new v3.1, which would put it in sixteenth place.
  
  1 reply →
tonyhart7 2 days ago

Yeah but the pricing is insane, I don't care about SOTA if its not break my bank

rsanek 2 days ago

Looks to be the ~same intelligence as gpt-oss-120B, but about 10x slower and 3x more expensive?

https://artificialanalysis.ai/models/deepseek-v3-1-reasoning

easygenes 2 days ago
Other benchmark aggregates are less favorable to GPT-OSS-120B: https://arxiv.org/abs/2508.12461
- petesergeant 2 days ago
  
  With all these things, it depends on your own eval suite. gpt-oss-120b works as well as o4-mini over my evals, which means I can run it via OpenRouter on Cerebras where it's SO DAMN FAST and like 1/5th the price of o4-mini.
  
  4 replies →
mdp2021 2 days ago

> same intelligence as gpt-oss-120B
Let's hope not, because gpt-oss-120B can be dramatically moronical. I am guessing the MoE contains some very dumb subnets.
Benchmarks can be a starting point, but you really have to see how the results work for you.
okasaki 2 days ago
My experience is that gpt-oss doesn't know much about obscure topics, so if you're using it for anything except puzzles or coding in popular languages, it won't do well as the bigger models.
It's knowledge seems to be lacking even compared to gpt3.
No idea how you'd benchmark this though.
- xadhominemx 1 day ago
  
  > My experience is that gpt-oss doesn't know much about obscure topics
  That is the point of these small models. Remove the bloat of obscure information (address that with RAG), leaving behind a core “reasoning” skeleton.
  
  1 reply →
- easygenes 2 days ago
  
  Something I was doing informally that seems very effective is asking for details about smaller cities and towns and lesser points of interest around the world. Bigger models tend to have a much better understanding and knowledge base for the more obscure places.
  
  3 replies →
petesergeant 2 days ago

I don't think you're necessarily wrong, but your source is currently only showing a single provider. Comparing:
https://openrouter.ai/openai/gpt-oss-120b and https://openrouter.ai/deepseek/deepseek-chat-v3.1 for the same providers is probably better, although gpt-oss-120b has been around long enough to have more providers, and presumably for hosters to get comfortable with it / optimize hosting of it.
lenerdenator 2 days ago

Clearly, this is a dark harbinger for Chinese AI supremacy /s

seunosewa 2 days ago

It's a hybrid reasoning model. It's good with tool calls and doesn't think too much about everything, but it regularly uses outdated tool formats randomly instead of the standard JSON format. I guess the V3 training set has a lot of those.

darrinm 2 days ago

Did you try the strict (beta) function calling? https://api-docs.deepseek.com/guides/function_calling
ivape 2 days ago
What formats? I thought the very schema of json is what allows these LLMs to enforce structured outputs at the decoder level? I guess you can do it with any format, but why stray from json?
- seunosewa 2 days ago
  
  Sometimes it will randomly generate something like this in the body of the text: ``` <tool_call>executeshell <arg_key>command</arg_key> <arg_value>echo "" >> novels/AI_Voodoo_Romance/chapter-1-a-new-dawn.txt</arg_value> </tool_call> ```
  or this: ``` <｜toolcallsbegin｜><｜toolcallbegin｜>executeshell<｜toolsep｜>{"command": "pwd && ls -la"}<｜toolcallend｜><｜toolcallsend｜> ```
  Prompting it to use the right format doesn't seem to work. Claude, Gemini, GPT5, and GLM 4.5, don't do that. To accomodate DeepSeek, the tiny agent that I'm building will have to support all the weird formats.
  
  3 replies →
- refulgentis 2 days ago
  
  In the modes in APIs, the sampling code essentially "rejects and reinference" any token sampled that wouldn't create valid JSON under a grammar created from the schema. Generally, the training is doing 99% of the work, of course, it's just "strict" means "we'll check it's work to the point a GBNF grammar created from the schema will validate."
  One of the funnier info scandals of 2025 has been that only Claude was even close to properly trained on JSON file edits until o3 was released, and even then it needed a bespoke format. Geminis have required using a non-formalized diff format by Aider. Wasn't until June Gemini could do diff-string-in-JSON better than 30% of the time and until GPT-5 that an OpenAI model could. (Though v4a, as OpenAI's bespoke edit format is called, is fine because it at least worked well in tool calls. Geminis was a clown show, you had to post process regular text completions to parse out any diffs)
  
  6 replies →

esafak 2 days ago

It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B: https://artificialanalysis.ai/models/deepseek-v3-1-reasoning

Pricing: https://openrouter.ai/deepseek/deepseek-chat-v3.1

bigyabai 2 days ago
Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.
- pdimitar 2 days ago
  
  Do you happen to know if it can be run via an eGPU enclosure with f.ex. RTX 5090 inside, under Linux?
  I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.
  
  11 replies →
- indigodaddy 2 days ago
  
  Do we get these good qwen models when using qwen-code CLI tool and authing via qwen.ai account?
  
  4 replies →
- decide1000 2 days ago
  
  I use it on a 24gb gpu Tesla P40. Very happy with the result.
  
  3 replies →
- tomr75 2 days ago
  
  With qwen code?
epolanski 2 days ago

I too like Qwen a lot, it's one of the best models for programming, I generally use it via the chat.

dsign 2 days ago

Some of it is in Kagi already. Impressive from both DeepSeek and Kagi.

lvl155 2 days ago
Is Kagi a Chinese-backed company?
- dsign 1 day ago
  
  I don't think so: https://help.kagi.com/kagi/company/

unleaded 2 days ago

not sure if its just chat.deepseek.com but one strange thing I've noticed is that now it replies to like 90% of your questions with "Of course.", even when it doesnt fit the prompt at all. maybe it's the backend injecting it to be more obedient? but you can tell it `don't begin the reply to this with "of" ending "course"` and it will listen. it's very strange

Some people on reddit (very reliable source I know) are saying it was trained on a lot of Gemini and I can see that. for example it does that annoying thing gemini does now where when you use slang or really any informal terms it puts them in quotes in its reply

edg5000 1 day ago
> for example it does that annoying thing gemini does now where when you use slang or really any informal terms it puts them in quotes in its reply
Haven´t used Gemini much, but the time I used it, it felt very academic and theoretical compared to Opus 4. So that seems to fit. But I'll have to do more evaluation of the non-Claude models to get a better idea of the differences.
- pradn 1 day ago
  
  All this points to "personality" being a big -- and sticky -- selling point for consumer-facing chat bots. People really did like the chatty, emoji-filled persona of the previous ChatGPT models. So OpenAI was ~forced to adjust GPT-5 to be closer to that style.
  It raises a funny "innovator's dilemma" that might happen. Where an incumbent has to serve chatty consumers, and therefore gets little technical/professional training data. And a more sober workplace chatbot provider is able to advance past the incumbent because they have better training data. Or maybe in a more subtle way, chatbot personas give you access to varying market segments, and varying data flywheels.

xmichael909 2 days ago

Seems to hallucinate more than any model I've ever worked with in the past 6 months.

Leynos 2 days ago
DeepSeek is bad for hallucinations in my experience. I wouldn't trust its output for anything serious without heavy grounding. It's great for fantastical fiction though. It also excels at giving characters "agency".
- bgilroy26 1 day ago
  
  Where would you go to find people posting their AI generated fiction? I haven't been able to find it on Reddit
  
  5 replies →
energy123 2 days ago

What context length did you use?
dude250711 2 days ago

Did they "borrow" bad data this time?

d4rkp4ttern 2 days ago

It’s a very smart move for DeepSeek to put out an Anthropic-compatible API, similar to Kimi-k2, GLM4.5 (Puzzled as to why Qwen didn’t do this). You can set up a simple function in your .zhsrc to run Claude-Code with these models:

https://github.com/pchalasani/claude-code-tools/tree/main?ta...

larrysalibra 1 day ago

Wow thanks! I just ran into my claude code session limit like an hour ago and tried the method you linked and added 10 CNY to a deepseek api account and an hour later i've got 7.77 CNY left and have used 3.3 million tokens.
I'm not confident enough to say it's as good as claude opus or even sonnet, but it seems not bad!
I did run into an api error when my context exceeded deepseek's 128k window and had to manually compact the context.
jodleif 2 days ago

Qwen have their own competitor to Claude Code.

vitaflo 2 days ago

Sad to see the off peak discount go. I was able to crank tokens like crazy and not have it cost anything. That said the pricing is still very very good so I can't complain too much.

abtinf 2 days ago

Unrelated, but it would really be nice to have a chart breaking down Price Per Token Per Second for various model, prompt, and hardware combinations.

imranq 2 days ago
There is one: https://pricepertoken.com/
- rapind 2 days ago
  
  Claude's Opus pricing is nuts. I'd be surprised if anyone uses it without the top max subscription.
  
  3 replies →

guerrilla 1 day ago

So, is the output price there why most models are extremely verbose? Is it just a ploy to make extra cash? It's super annoying that I have to constantly tell it to be more and more concise.

diggan 1 day ago

> It's super annoying that I have to constantly tell it to be more and more concise.
While system promting is the easy way of limiting the output in a somewhat predictable manner, have you tried setting `max_tokens` when doing inference? For me that works very well for constraining the output, if you set it to 100 you get very short answers while if you set it to 10,000 you can very long responses.

fariszr 2 days ago

Is it foot at tool use? For me tool use is table stakes, if a model can't use tools then its almost useless.

snippai 2 days ago

Looks quite competitive among open-weight models, but I guess still behind GPT-5 or Claude a lot.

caycep 1 day ago

this might be OT and covered somewhere else but what's the latest/greatest on these models and their effect on the linguistics field, vs. what does the latest and greatness in linguistics feel about these models?

CuriouslyC 1 day ago

Cries in 128k context. Probably will be a good orchestrator though, can always delegate to Gemini.

donbreo 2 days ago

It still cant name all the states in India

Leynos 2 days ago

That's interesting. I am curious about the extent of the training data in these models.
I asked Kimi K2 for an account of growing up in my home town in Scotland, and it was ridiculously accurate. I then asked it to do the same for a similarly sized town in Kerala. ChatGPT suggested that while it was a good approximation, K2 got some of the specifics wrong.

dr_dshiv 2 days ago

Cheep!

$0.56 per million tokens in — and $1.68 per million tokens out.

NiekvdMaas 2 days ago
That's actually a big bump from the previous pricing: $0.27/$1.10
- kenmacd 2 days ago
  
  And unfortunately no more half price 8-hours a day either :(
manishsharan 1 day ago

The next cheapest and capable model is GLM 4.5 at $0.6 per million tokens in and $2.2 per million tokens out. Glad to see DeepSeek is still be the value king.
But I am sti disappointed with the price increase.

niteshpant 1 day ago

how can deepseek be so cheap* yet so effective?

*pricing: MODEL deepseek-chat deepseek-reasoner 1M INPUT TOKENS (CACHE HIT) $0.07 1M INPUT TOKENS (CACHE MISS) $0.56 1M OUTPUT TOKENS $1.68

Alifatisk 1 day ago

I think it's because of a combination between the MoE model architecture and the inference done in large batches and run in parallel

axpy906 1 day ago

Hmm. It’s still not close to paid frontier on SWE bench.

asaddhamani 1 day ago

In my experience, Qwen 3 coder has been very good for agentic coding with Cline. I tried DeepSeek v3.1 and wasn't pleased with it.

greenavocado 2 days ago

I have yet to see evidence that it is better for agentic coding tasks than GLM-4.5

postalrat 2 days ago
Is that it? Nothing else you haven't seen evidence for?
- greenavocado 1 day ago
  
  Just that

reassess_blind 2 days ago

Bubble popped yet?

jbellis 2 days ago

About halfway between V3 and Qwen3 Coder.

https://brokk.ai/power-ranking?version=openround-2025-08-20&...

indigodaddy 2 days ago
Is gpt-5 Mini free from any providers?
- drmidnight 2 days ago
  
  Duck.ai has it as an option

tw1984 2 days ago

just saw this on Chinese internet - deepseek officially mentioned that v3.1 is trained using UE8M0 FP8 as that is the FP8 to be supported by the next gen Chinese AI chip. so basically -

some Chinese next gen AI chips is coming, deepseek is working with them to get its flagship model trained using such domestic chips.

interesting time ahead! just imagine what it could do to NVIDIA share price when deepseek releases a SOTA new model trained without using NVIDIA chips.

Alifatisk 1 day ago
Time to short Nvidia?
- hangonhn 1 day ago
  
  No because people never really talk about the quantity of the alternatives -- i.e. Huawei Ascent. Even if Huawei can match the quality, their yields are still abysmal. The numbers I've heard are in the hundreds of thousands vs. millions by Nvidia. In the near future, Nvidia's dominance is pretty secure. The only thing that can threaten it is if this whole AI thing isn't worth what some people imagined it is worth and people start to realize this.
- ychan268 1 day ago
  
  No evidence v3.1 is trained on Chinese chips(they said very ambiguously, only said they adapted the model for Chinese chips, could be training, could be inference)
  Anyway, from my experience, if China really has advanced AI chips for SOTA model, I am sure propaganda machine will go all out, look how they boasted Huawei CPU that’s two generations behind Qualcomm and TSMC
Narciss 2 days ago

V interesting, thanks for sharing

aussieguy1234 2 days ago

They say the SWE bench verified score is 66%. Claude Sonnet 4 is 67%. Not sure if the 1% difference here is statistically significant or not.

I'll have to see how things go with this model after a week, once the hype has died down.

techlatest_net 1 day ago

[dead]

theuurviv467456 2 days ago

[flagged]

tonyhart7 2 days ago

Every country acting in its own best interest, US is not unique in this regard
wait until you find out that China also acting the same way toward the rest of the world (surprise pikachu face)
hopelite 2 days ago

This does not make any sense to me. “There”? “‘Nationalist’ () bans” of and by whom?
Dark propaganda opposed to what, light propaganda? The Chinese model being released is about keeping China down?
You seem very animated about this, but you would probably have more success if you tried to clarify this a bit more.
simianparrot 2 days ago
[flagged]
- tehjoker 2 days ago
  
  Incredible how "keeping their people down" means leaps in personal wealth and happiness for huge swathes of the population and internal criticism is that it is a "poverty reduction machine" that is too focused.
  
  6 replies →

loog5566 2 days ago

I'm doing this model

hereme888 1 day ago

Reminder DeepSeek is a Chinese company whose headstart is attributed to stealing IP from American companies. Without the huge theft, they'd be nowhere.

pdntspa 1 day ago

As if those american companies played fair with training their AIs
It's theft all the way down, son
andreashaerter 1 day ago

I can't say whether those claims are true. But even if they were, it feels selective. Every major AI company trained on oceans of data they didn't create or own. The whole field was built on "borrowing" IP, open-source code, academic papers, datasets, art, text, you name it.
Drawing the line only now... saying this is where copying stops being okay doesn't seem very fair. No AI company is really in a position to whine about it from my POV (ignoring any lawyer POV). Cue the world's smallest violin
bobro 1 day ago
Can you contrast this with Western companies? What are the Chinese companies stealing that Western companies aren’t? Do you mean tech or content?
- hereme888 1 day ago
  
  Ethics of Chinese vs. Western companies? Everything. I'm sure you're aware of how many hundreds of $billions of American IP are stolen by Chinese companies.
  
  4 replies →
nancyminusone 1 day ago

Most ironic comment I've yet laid eyes on.
computerex 1 day ago

I find it hilarious you felt the need to make this comment in defense of American LLMs. You know that American LLMs aren’t trained ethically either, right? Many people’s data was used for training without their permission.
BTW DeepSeek has contributed a lot, with actual white papers describing in detail their optimizations. How are the rest of the American AI labs doing in contributing research and helping one another advance the field?
replete 1 day ago
Reminder that OpenAI is an American company whose headstart is attributed to stealing copyrighted material from everyone else. Without the huge theft, they'd be nowhere.
- hereme888 1 day ago
  
  Last I checked, as it concerns the training of their models, all legal challenges are pending. No theft has yet been proven, as they used publicly available data.
  
  3 replies →
pphysch 1 day ago

If an American company did this, it would be "innovative bootstrapping". Yawn.
mach5 1 day ago

who cares