For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.
Was that document almost exclusively written with LLMs? I looked at it last night (~8 hours ago) and it was riddled with mistakes, most egregious was that the "Run with Ollama" section had instructions for how to install Ollama, but then the shell commands were actually running llama.cpp, a mistake probably no human would make.
Do you have any plans on disclosing how much of these docs are written by humans vs not?
Regardless, thanks for the continued release of quants and weights :)
By the way, I'm wondering why unsloth (a goddamn python library) tries to run apt-get with sudo (and fails on my nixos). Like how tf we are supposed to use that?
Oh hey I'm assuming this is for conversion to GGUF after a finetune? If you need to quantize to GGUF Q4_K_M, we have to compile llama.cpp, hence apt-get and compiling llama.cpp within a Python shell.
There is a way to convert to Q8_0, BF16, F16 without compiling llama.cpp, and it's enabled if you use `FastModel` and not on `FastLanguageModel`
Essentially I try to do `sudo apt-get` if it fails then `apt-get` and if all fails, it just fails. We need `build-essential cmake curl libcurl4-openssl-dev`
hey fellow crazy person! slight tangent: one thing that helps keep me grounded with "LLMs are doing much more than regurgitation" is watching them try to get things to work on nixos - and hitting every rake on the way to hell!
nixos is such a great way to expose code doing things it shouldn't be doing.
It’d also be great if you guys could do a fine tune to run on an 8x80G A/H100. These H200/B200 configs are harder to come by (and much more expensive).
Unsloth should work on any GPU setup all the way until the old Tesla T4s and the newer B200s :) We're working on a faster and better multi GPU version, but using accelerate / torchrun manually + Unsloth should work out of the box!
if you are running a 2bit quant, you are not giving up performance but gaining 100% performance since the alternative is usually 0%. Smaller quants are for folks who won't be able to run anything at all, so you run the largest you can run relative to your hardware. I for instance often ran Q3_K_L, I don't think of how much performance I'm giving up, but rather how without Q3, I won't be able to run it at all. With that said, for R1, I did some tests against 2 public interfaces and my local Q3 crushed them. The problem with a lot of model providers is we can never be sure what they are serving up and could take shortcuts to maximize profit.
Looks like it doesn't get close to GPT-5, Claude 4, or GLM-4.5, but still does reasonably well compared to other open weight models. Benchmarks are rarely the full story though, so time will tell how good it is in practice.
garbage benchmark, inconsistent mix of "agent tools" and models. if you wanted to present a meaningful benchmark, the agent tools will stay the same and then we can really compare the models.
there are plenty of other benchmarks that disagree with these, with that said. from my experience most of these benchmarks are trash. use the model yourself, apply your own set of problems and see how well it fairs.
I also publish my own evals on new models (using coding tasks that I curated myself, without tools, rated by human with rubrics). Would love you to check out and give your thoughts:
With all these things, it depends on your own eval suite. gpt-oss-120b works as well as o4-mini over my evals, which means I can run it via OpenRouter on Cerebras where it's SO DAMN FAST and like 1/5th the price of o4-mini.
My experience is that gpt-oss doesn't know much about obscure topics, so if you're using it for anything except puzzles or coding in popular languages, it won't do well as the bigger models.
It's knowledge seems to be lacking even compared to gpt3.
Something I was doing informally that seems very effective is asking for details about smaller cities and towns and lesser points of interest around the world. Bigger models tend to have a much better understanding and knowledge base for the more obscure places.
It's a hybrid reasoning model. It's good with tool calls and doesn't think too much about everything, but it regularly uses outdated tool formats randomly instead of the standard JSON format. I guess the V3 training set has a lot of those.
What formats? I thought the very schema of json is what allows these LLMs to enforce structured outputs at the decoder level? I guess you can do it with any format, but why stray from json?
Sometimes it will randomly generate something like this in the body of the text:
```
<tool_call>executeshell
<arg_key>command</arg_key>
<arg_value>echo "" >> novels/AI_Voodoo_Romance/chapter-1-a-new-dawn.txt</arg_value>
</tool_call>
```
or this:
```
<|toolcallsbegin|><|toolcallbegin|>executeshell<|toolsep|>{"command": "pwd && ls -la"}<|toolcallend|><|toolcallsend|>
```
Prompting it to use the right format doesn't seem to work. Claude, Gemini, GPT5, and GLM 4.5, don't do that. To accomodate DeepSeek, the tiny agent that I'm building will have to support all the weird formats.
In the modes in APIs, the sampling code essentially "rejects and reinference" any token sampled that wouldn't create valid JSON under a grammar created from the schema. Generally, the training is doing 99% of the work, of course, it's just "strict" means "we'll check it's work to the point a GBNF grammar created from the schema will validate."
One of the funnier info scandals of 2025 has been that only Claude was even close to properly trained on JSON file edits until o3 was released, and even then it needed a bespoke format. Geminis have required using a non-formalized diff format by Aider. Wasn't until June Gemini could do diff-string-in-JSON better than 30% of the time and until GPT-5 that an OpenAI model could. (Though v4a, as OpenAI's bespoke edit format is called, is fine because it at least worked well in tool calls. Geminis was a clown show, you had to post process regular text completions to parse out any diffs)
Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.
Do you happen to know if it can be run via an eGPU enclosure with f.ex. RTX 5090 inside, under Linux?
I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.
not sure if its just chat.deepseek.com but one strange thing I've noticed is that now it replies to like 90% of your questions with "Of course.", even when it doesnt fit the prompt at all. maybe it's the backend injecting it to be more obedient? but you can tell it `don't begin the reply to this with "of" ending "course"` and it will listen. it's very strange
Some people on reddit (very reliable source I know) are saying it was trained on a lot of Gemini and I can see that. for example it does that annoying thing gemini does now where when you use slang or really any informal terms it puts them in quotes in its reply
> for example it does that annoying thing gemini does now where when you use slang or really any informal terms it puts them in quotes in its reply
Haven´t used Gemini much, but the time I used it, it felt very academic and theoretical compared to Opus 4. So that seems to fit. But I'll have to do more evaluation of the non-Claude models to get a better idea of the differences.
All this points to "personality" being a big -- and sticky -- selling point for consumer-facing chat bots. People really did like the chatty, emoji-filled persona of the previous ChatGPT models. So OpenAI was ~forced to adjust GPT-5 to be closer to that style.
It raises a funny "innovator's dilemma" that might happen. Where an incumbent has to serve chatty consumers, and therefore gets little technical/professional training data. And a more sober workplace chatbot provider is able to advance past the incumbent because they have better training data. Or maybe in a more subtle way, chatbot personas give you access to varying market segments, and varying data flywheels.
DeepSeek is bad for hallucinations in my experience. I wouldn't trust its output for anything serious without heavy grounding. It's great for fantastical fiction though. It also excels at giving characters "agency".
It’s a very smart move for DeepSeek to put out an Anthropic-compatible API, similar to Kimi-k2, GLM4.5 (Puzzled as to why Qwen didn’t do this). You can set up a simple function in your .zhsrc to run Claude-Code with these models:
Wow thanks! I just ran into my claude code session limit like an hour ago and tried the method you linked and added 10 CNY to a deepseek api account and an hour later i've got 7.77 CNY left and have used 3.3 million tokens.
I'm not confident enough to say it's as good as claude opus or even sonnet, but it seems not bad!
I did run into an api error when my context exceeded deepseek's 128k window and had to manually compact the context.
Sad to see the off peak discount go. I was able to crank tokens like crazy and not have it cost anything. That said the pricing is still very very good so I can't complain too much.
So, is the output price there why most models are extremely verbose? Is it just a ploy to make extra cash? It's super annoying that I have to constantly tell it to be more and more concise.
> It's super annoying that I have to constantly tell it to be more and more concise.
While system promting is the easy way of limiting the output in a somewhat predictable manner, have you tried setting `max_tokens` when doing inference? For me that works very well for constraining the output, if you set it to 100 you get very short answers while if you set it to 10,000 you can very long responses.
this might be OT and covered somewhere else but what's the latest/greatest on these models and their effect on the linguistics field, vs. what does the latest and greatness in linguistics feel about these models?
That's interesting. I am curious about the extent of the training data in these models.
I asked Kimi K2 for an account of growing up in my home town in Scotland, and it was ridiculously accurate. I then asked it to do the same for a similarly sized town in Kerala. ChatGPT suggested that while it was a good approximation, K2 got some of the specifics wrong.
The next cheapest and capable model is GLM 4.5 at $0.6 per million tokens in and $2.2 per million tokens out. Glad to see DeepSeek is still be the value king.
But I am sti disappointed with the price increase.
just saw this on Chinese internet - deepseek officially mentioned that v3.1 is trained using UE8M0 FP8 as that is the FP8 to be supported by the next gen Chinese AI chip. so basically -
some Chinese next gen AI chips is coming, deepseek is working with them to get its flagship model trained using such domestic chips.
interesting time ahead! just imagine what it could do to NVIDIA share price when deepseek releases a SOTA new model trained without using NVIDIA chips.
No because people never really talk about the quantity of the alternatives -- i.e. Huawei Ascent. Even if Huawei can match the quality, their yields are still abysmal. The numbers I've heard are in the hundreds of thousands vs. millions by Nvidia. In the near future, Nvidia's dominance is pretty secure. The only thing that can threaten it is if this whole AI thing isn't worth what some people imagined it is worth and people start to realize this.
No evidence v3.1 is trained on Chinese chips(they said very ambiguously, only said they adapted the model for Chinese chips, could be training, could be inference)
Anyway, from my experience, if China really has advanced AI chips for SOTA model, I am sure propaganda machine will go all out, look how they boasted Huawei CPU that’s two generations behind Qualcomm and TSMC
Incredible how "keeping their people down" means leaps in personal wealth and happiness for huge swathes of the population and internal criticism is that it is a "poverty reduction machine" that is too focused.
Reminder DeepSeek is a Chinese company whose headstart is attributed to stealing IP from American companies. Without the huge theft, they'd be nowhere.
I can't say whether those claims are true. But even if they were, it feels selective. Every major AI company trained on oceans of data they didn't create or own. The whole field was built on "borrowing" IP, open-source code, academic papers, datasets, art, text, you name it.
Drawing the line only now... saying this is where copying stops being okay doesn't seem very fair. No AI company is really in a position to whine about it from my POV (ignoring any lawyer POV). Cue the world's smallest violin
Ethics of Chinese vs. Western companies? Everything. I'm sure you're aware of how many hundreds of $billions of American IP are stolen by Chinese companies.
I find it hilarious you felt the need to make this comment in defense of American LLMs. You know that American LLMs aren’t trained ethically either, right? Many people’s data was used for training without their permission.
BTW DeepSeek has contributed a lot, with actual white papers describing in detail their optimizations. How are the rest of the American AI labs doing in contributing research and helping one another advance the field?
Reminder that OpenAI is an American company whose headstart is attributed to stealing copyrighted material from everyone else. Without the huge theft, they'd be nowhere.
Last I checked, as it concerns the training of their models, all legal challenges are pending. No theft has yet been proven, as they used publicly available data.
For local runs, I made some GGUFs! You need around RAM + VRAM >= 250GB for good perf for dynamic 2bit (2bit MoE, 6-8bit rest) - can also do SSD offloading but it'll be slow.
./llama.cpp/llama-cli -hf unsloth/DeepSeek-V3.1-GGUF:UD-Q2_K_XL -ngl 99 --jinja -ot ".ffn_.*_exps.=CPU"
More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1
> More details on running + optimal params here: https://docs.unsloth.ai/basics/deepseek-v3.1
Was that document almost exclusively written with LLMs? I looked at it last night (~8 hours ago) and it was riddled with mistakes, most egregious was that the "Run with Ollama" section had instructions for how to install Ollama, but then the shell commands were actually running llama.cpp, a mistake probably no human would make.
Do you have any plans on disclosing how much of these docs are written by humans vs not?
Regardless, thanks for the continued release of quants and weights :)
Oh hey sorry the docs are still in construction! Are you referring to merging GGUFs to Ollama - it should work fine! Ie:
``` ./llama.cpp/llama-gguf-split --merge \ DeepSeek-V3.1-GGUF/DeepSeek-V3.1-UD-Q2_K_XL/DeepSeek-V3.1-UD-Q2_K_XL-00001-of-00006.gguf \ merged_file.gguf ```
Ollama can only allow merged GGUFs (not splitted ones), so hence the command.
All docs are made by humans (primarily my brother and me), just sometimes there might be some typos (sorry in advance)
I'm also uploading Ollama compatible versions directly so ollama run can work (it'll take a few more hours)
> but then the shell commands were actually running llama.cpp, a mistake probably no human would make.
But in the docs I see things like
Wouldn't this explain that? (Didn't look too deep)
By the way, I'm wondering why unsloth (a goddamn python library) tries to run apt-get with sudo (and fails on my nixos). Like how tf we are supposed to use that?
Oh hey I'm assuming this is for conversion to GGUF after a finetune? If you need to quantize to GGUF Q4_K_M, we have to compile llama.cpp, hence apt-get and compiling llama.cpp within a Python shell.
There is a way to convert to Q8_0, BF16, F16 without compiling llama.cpp, and it's enabled if you use `FastModel` and not on `FastLanguageModel`
Essentially I try to do `sudo apt-get` if it fails then `apt-get` and if all fails, it just fails. We need `build-essential cmake curl libcurl4-openssl-dev`
See https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_z...
65 replies →
hey fellow crazy person! slight tangent: one thing that helps keep me grounded with "LLMs are doing much more than regurgitation" is watching them try to get things to work on nixos - and hitting every rake on the way to hell!
nixos is such a great way to expose code doing things it shouldn't be doing.
3 replies →
Thanks for your great work with quants. I would really appreciate UD GGUFs for V3.1-Base (and even more so, GLM-4.5-Base + Air-Base).
Thanks! Oh base models? Interesting since I normally do only Instruct models - I can take a look though!
It’d also be great if you guys could do a fine tune to run on an 8x80G A/H100. These H200/B200 configs are harder to come by (and much more expensive).
Unsloth should work on any GPU setup all the way until the old Tesla T4s and the newer B200s :) We're working on a faster and better multi GPU version, but using accelerate / torchrun manually + Unsloth should work out of the box!
1 reply →
>250GB, how do you guys run this stuff?
I'm working on sub 165GB ones!
165GB will need a 24GB GPU + 141GB of RAM for reasonably fast inference or a Mac
for such dynamic 2bit, is there any benchmark results showing how many performance I would give up compared to the original model? thanks.
Currently no, but I'm running them! Some people on the aider discord are running some benchmarks!
1 reply →
if you are running a 2bit quant, you are not giving up performance but gaining 100% performance since the alternative is usually 0%. Smaller quants are for folks who won't be able to run anything at all, so you run the largest you can run relative to your hardware. I for instance often ran Q3_K_L, I don't think of how much performance I'm giving up, but rather how without Q3, I won't be able to run it at all. With that said, for R1, I did some tests against 2 public interfaces and my local Q3 crushed them. The problem with a lot of model providers is we can never be sure what they are serving up and could take shortcuts to maximize profit.
5 replies →
For reference, here is the terminal-bench leaderboard:
https://www.tbench.ai/leaderboard
Looks like it doesn't get close to GPT-5, Claude 4, or GLM-4.5, but still does reasonably well compared to other open weight models. Benchmarks are rarely the full story though, so time will tell how good it is in practice.
garbage benchmark, inconsistent mix of "agent tools" and models. if you wanted to present a meaningful benchmark, the agent tools will stay the same and then we can really compare the models.
there are plenty of other benchmarks that disagree with these, with that said. from my experience most of these benchmarks are trash. use the model yourself, apply your own set of problems and see how well it fairs.
Hey. I like your roast on benchmarks.
I also publish my own evals on new models (using coding tasks that I curated myself, without tools, rated by human with rubrics). Would love you to check out and give your thoughts:
Example recent one on GPT-5:
https://eval.16x.engineer/blog/gpt-5-coding-evaluation-under...
All results:
https://eval.16x.engineer/evals/coding
Which benchmarks are not garbage?
I don't consider myself super special. I think it should be doable to create a benchmark that beats me having to test every single new model.
tbh companies like anthopic, openai, create custom agents for specific benchmarks
Do you have a source for this? I’m intrigued
1 reply →
Aren't good benchmarks supposed to be secret?
9 replies →
Depends on the agent. Rank 5 and 15 are claude 4 sonnet, and this stands close to 15th.
My personal experience is that it produces high quality results.
Any example or prompt you use to make this statment?
6 replies →
Vine is about the only benchmark I think is real.
We made objective systems turn out subjective answers… why the shit would anyone think objective tests would be able to grade them?
The DeepSeek R1 in that list is the old model that's been replaced. Update: Understood.
Yes, and 31.3% is given in the announcement as the performance of the new v3.1, which would put it in sixteenth place.
1 reply →
Yeah but the pricing is insane, I don't care about SOTA if its not break my bank
Looks to be the ~same intelligence as gpt-oss-120B, but about 10x slower and 3x more expensive?
https://artificialanalysis.ai/models/deepseek-v3-1-reasoning
Other benchmark aggregates are less favorable to GPT-OSS-120B: https://arxiv.org/abs/2508.12461
With all these things, it depends on your own eval suite. gpt-oss-120b works as well as o4-mini over my evals, which means I can run it via OpenRouter on Cerebras where it's SO DAMN FAST and like 1/5th the price of o4-mini.
4 replies →
> same intelligence as gpt-oss-120B
Let's hope not, because gpt-oss-120B can be dramatically moronical. I am guessing the MoE contains some very dumb subnets.
Benchmarks can be a starting point, but you really have to see how the results work for you.
My experience is that gpt-oss doesn't know much about obscure topics, so if you're using it for anything except puzzles or coding in popular languages, it won't do well as the bigger models.
It's knowledge seems to be lacking even compared to gpt3.
No idea how you'd benchmark this though.
> My experience is that gpt-oss doesn't know much about obscure topics
That is the point of these small models. Remove the bloat of obscure information (address that with RAG), leaving behind a core “reasoning” skeleton.
1 reply →
Something I was doing informally that seems very effective is asking for details about smaller cities and towns and lesser points of interest around the world. Bigger models tend to have a much better understanding and knowledge base for the more obscure places.
3 replies →
I don't think you're necessarily wrong, but your source is currently only showing a single provider. Comparing:
https://openrouter.ai/openai/gpt-oss-120b and https://openrouter.ai/deepseek/deepseek-chat-v3.1 for the same providers is probably better, although gpt-oss-120b has been around long enough to have more providers, and presumably for hosters to get comfortable with it / optimize hosting of it.
Clearly, this is a dark harbinger for Chinese AI supremacy /s
It's a hybrid reasoning model. It's good with tool calls and doesn't think too much about everything, but it regularly uses outdated tool formats randomly instead of the standard JSON format. I guess the V3 training set has a lot of those.
Did you try the strict (beta) function calling? https://api-docs.deepseek.com/guides/function_calling
What formats? I thought the very schema of json is what allows these LLMs to enforce structured outputs at the decoder level? I guess you can do it with any format, but why stray from json?
Sometimes it will randomly generate something like this in the body of the text: ``` <tool_call>executeshell <arg_key>command</arg_key> <arg_value>echo "" >> novels/AI_Voodoo_Romance/chapter-1-a-new-dawn.txt</arg_value> </tool_call> ```
or this: ``` <|toolcallsbegin|><|toolcallbegin|>executeshell<|toolsep|>{"command": "pwd && ls -la"}<|toolcallend|><|toolcallsend|> ```
Prompting it to use the right format doesn't seem to work. Claude, Gemini, GPT5, and GLM 4.5, don't do that. To accomodate DeepSeek, the tiny agent that I'm building will have to support all the weird formats.
3 replies →
In the modes in APIs, the sampling code essentially "rejects and reinference" any token sampled that wouldn't create valid JSON under a grammar created from the schema. Generally, the training is doing 99% of the work, of course, it's just "strict" means "we'll check it's work to the point a GBNF grammar created from the schema will validate."
One of the funnier info scandals of 2025 has been that only Claude was even close to properly trained on JSON file edits until o3 was released, and even then it needed a bespoke format. Geminis have required using a non-formalized diff format by Aider. Wasn't until June Gemini could do diff-string-in-JSON better than 30% of the time and until GPT-5 that an OpenAI model could. (Though v4a, as OpenAI's bespoke edit format is called, is fine because it at least worked well in tool calls. Geminis was a clown show, you had to post process regular text completions to parse out any diffs)
6 replies →
It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B: https://artificialanalysis.ai/models/deepseek-v3-1-reasoning
Pricing: https://openrouter.ai/deepseek/deepseek-chat-v3.1
Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.
Do you happen to know if it can be run via an eGPU enclosure with f.ex. RTX 5090 inside, under Linux?
I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.
11 replies →
Do we get these good qwen models when using qwen-code CLI tool and authing via qwen.ai account?
4 replies →
I use it on a 24gb gpu Tesla P40. Very happy with the result.
3 replies →
With qwen code?
I too like Qwen a lot, it's one of the best models for programming, I generally use it via the chat.
Some of it is in Kagi already. Impressive from both DeepSeek and Kagi.
Is Kagi a Chinese-backed company?
I don't think so: https://help.kagi.com/kagi/company/
not sure if its just chat.deepseek.com but one strange thing I've noticed is that now it replies to like 90% of your questions with "Of course.", even when it doesnt fit the prompt at all. maybe it's the backend injecting it to be more obedient? but you can tell it `don't begin the reply to this with "of" ending "course"` and it will listen. it's very strange
Some people on reddit (very reliable source I know) are saying it was trained on a lot of Gemini and I can see that. for example it does that annoying thing gemini does now where when you use slang or really any informal terms it puts them in quotes in its reply
> for example it does that annoying thing gemini does now where when you use slang or really any informal terms it puts them in quotes in its reply
Haven´t used Gemini much, but the time I used it, it felt very academic and theoretical compared to Opus 4. So that seems to fit. But I'll have to do more evaluation of the non-Claude models to get a better idea of the differences.
All this points to "personality" being a big -- and sticky -- selling point for consumer-facing chat bots. People really did like the chatty, emoji-filled persona of the previous ChatGPT models. So OpenAI was ~forced to adjust GPT-5 to be closer to that style.
It raises a funny "innovator's dilemma" that might happen. Where an incumbent has to serve chatty consumers, and therefore gets little technical/professional training data. And a more sober workplace chatbot provider is able to advance past the incumbent because they have better training data. Or maybe in a more subtle way, chatbot personas give you access to varying market segments, and varying data flywheels.
Seems to hallucinate more than any model I've ever worked with in the past 6 months.
DeepSeek is bad for hallucinations in my experience. I wouldn't trust its output for anything serious without heavy grounding. It's great for fantastical fiction though. It also excels at giving characters "agency".
Where would you go to find people posting their AI generated fiction? I haven't been able to find it on Reddit
5 replies →
What context length did you use?
Did they "borrow" bad data this time?
It’s a very smart move for DeepSeek to put out an Anthropic-compatible API, similar to Kimi-k2, GLM4.5 (Puzzled as to why Qwen didn’t do this). You can set up a simple function in your .zhsrc to run Claude-Code with these models:
https://github.com/pchalasani/claude-code-tools/tree/main?ta...
Wow thanks! I just ran into my claude code session limit like an hour ago and tried the method you linked and added 10 CNY to a deepseek api account and an hour later i've got 7.77 CNY left and have used 3.3 million tokens.
I'm not confident enough to say it's as good as claude opus or even sonnet, but it seems not bad!
I did run into an api error when my context exceeded deepseek's 128k window and had to manually compact the context.
Qwen have their own competitor to Claude Code.
Sad to see the off peak discount go. I was able to crank tokens like crazy and not have it cost anything. That said the pricing is still very very good so I can't complain too much.
Unrelated, but it would really be nice to have a chart breaking down Price Per Token Per Second for various model, prompt, and hardware combinations.
There is one: https://pricepertoken.com/
Claude's Opus pricing is nuts. I'd be surprised if anyone uses it without the top max subscription.
3 replies →
So, is the output price there why most models are extremely verbose? Is it just a ploy to make extra cash? It's super annoying that I have to constantly tell it to be more and more concise.
> It's super annoying that I have to constantly tell it to be more and more concise.
While system promting is the easy way of limiting the output in a somewhat predictable manner, have you tried setting `max_tokens` when doing inference? For me that works very well for constraining the output, if you set it to 100 you get very short answers while if you set it to 10,000 you can very long responses.
Is it foot at tool use? For me tool use is table stakes, if a model can't use tools then its almost useless.
Looks quite competitive among open-weight models, but I guess still behind GPT-5 or Claude a lot.
this might be OT and covered somewhere else but what's the latest/greatest on these models and their effect on the linguistics field, vs. what does the latest and greatness in linguistics feel about these models?
Cries in 128k context. Probably will be a good orchestrator though, can always delegate to Gemini.
It still cant name all the states in India
That's interesting. I am curious about the extent of the training data in these models.
I asked Kimi K2 for an account of growing up in my home town in Scotland, and it was ridiculously accurate. I then asked it to do the same for a similarly sized town in Kerala. ChatGPT suggested that while it was a good approximation, K2 got some of the specifics wrong.
Cheep!
$0.56 per million tokens in — and $1.68 per million tokens out.
That's actually a big bump from the previous pricing: $0.27/$1.10
And unfortunately no more half price 8-hours a day either :(
The next cheapest and capable model is GLM 4.5 at $0.6 per million tokens in and $2.2 per million tokens out. Glad to see DeepSeek is still be the value king.
But I am sti disappointed with the price increase.
how can deepseek be so cheap* yet so effective?
*pricing: MODEL deepseek-chat deepseek-reasoner 1M INPUT TOKENS (CACHE HIT) $0.07 1M INPUT TOKENS (CACHE MISS) $0.56 1M OUTPUT TOKENS $1.68
I think it's because of a combination between the MoE model architecture and the inference done in large batches and run in parallel
Hmm. It’s still not close to paid frontier on SWE bench.
In my experience, Qwen 3 coder has been very good for agentic coding with Cline. I tried DeepSeek v3.1 and wasn't pleased with it.
I have yet to see evidence that it is better for agentic coding tasks than GLM-4.5
Is that it? Nothing else you haven't seen evidence for?
Just that
Bubble popped yet?
About halfway between V3 and Qwen3 Coder.
https://brokk.ai/power-ranking?version=openround-2025-08-20&...
Is gpt-5 Mini free from any providers?
Duck.ai has it as an option
just saw this on Chinese internet - deepseek officially mentioned that v3.1 is trained using UE8M0 FP8 as that is the FP8 to be supported by the next gen Chinese AI chip. so basically -
some Chinese next gen AI chips is coming, deepseek is working with them to get its flagship model trained using such domestic chips.
interesting time ahead! just imagine what it could do to NVIDIA share price when deepseek releases a SOTA new model trained without using NVIDIA chips.
Time to short Nvidia?
No because people never really talk about the quantity of the alternatives -- i.e. Huawei Ascent. Even if Huawei can match the quality, their yields are still abysmal. The numbers I've heard are in the hundreds of thousands vs. millions by Nvidia. In the near future, Nvidia's dominance is pretty secure. The only thing that can threaten it is if this whole AI thing isn't worth what some people imagined it is worth and people start to realize this.
No evidence v3.1 is trained on Chinese chips(they said very ambiguously, only said they adapted the model for Chinese chips, could be training, could be inference)
Anyway, from my experience, if China really has advanced AI chips for SOTA model, I am sure propaganda machine will go all out, look how they boasted Huawei CPU that’s two generations behind Qualcomm and TSMC
V interesting, thanks for sharing
They say the SWE bench verified score is 66%. Claude Sonnet 4 is 67%. Not sure if the 1% difference here is statistically significant or not.
I'll have to see how things go with this model after a week, once the hype has died down.
[dead]
[flagged]
Every country acting in its own best interest, US is not unique in this regard
wait until you find out that China also acting the same way toward the rest of the world (surprise pikachu face)
This does not make any sense to me. “There”? “‘Nationalist’ () bans” of and by whom?
Dark propaganda opposed to what, light propaganda? The Chinese model being released is about keeping China down?
You seem very animated about this, but you would probably have more success if you tried to clarify this a bit more.
[flagged]
Incredible how "keeping their people down" means leaps in personal wealth and happiness for huge swathes of the population and internal criticism is that it is a "poverty reduction machine" that is too focused.
6 replies →
I'm doing this model
Reminder DeepSeek is a Chinese company whose headstart is attributed to stealing IP from American companies. Without the huge theft, they'd be nowhere.
As if those american companies played fair with training their AIs
It's theft all the way down, son
I can't say whether those claims are true. But even if they were, it feels selective. Every major AI company trained on oceans of data they didn't create or own. The whole field was built on "borrowing" IP, open-source code, academic papers, datasets, art, text, you name it.
Drawing the line only now... saying this is where copying stops being okay doesn't seem very fair. No AI company is really in a position to whine about it from my POV (ignoring any lawyer POV). Cue the world's smallest violin
Can you contrast this with Western companies? What are the Chinese companies stealing that Western companies aren’t? Do you mean tech or content?
Ethics of Chinese vs. Western companies? Everything. I'm sure you're aware of how many hundreds of $billions of American IP are stolen by Chinese companies.
4 replies →
Most ironic comment I've yet laid eyes on.
I find it hilarious you felt the need to make this comment in defense of American LLMs. You know that American LLMs aren’t trained ethically either, right? Many people’s data was used for training without their permission.
BTW DeepSeek has contributed a lot, with actual white papers describing in detail their optimizations. How are the rest of the American AI labs doing in contributing research and helping one another advance the field?
Reminder that OpenAI is an American company whose headstart is attributed to stealing copyrighted material from everyone else. Without the huge theft, they'd be nowhere.
Last I checked, as it concerns the training of their models, all legal challenges are pending. No theft has yet been proven, as they used publicly available data.
3 replies →
If an American company did this, it would be "innovative bootstrapping". Yawn.
who cares