Comment by wg0

19 hours ago

A wise man from Google said in an internal memo to the tune of: "We do not have any moat neither does anyone else."

Deepseek v4 is good enough, really really good given the price it is offered at.

PS: Just to be clear - even the most expensive AI models are unreliable, would make stupid mistakes and their code output MUST be reviewed carefully so Deepseek v4 is not any different either, it too is just a random token generator based on token frequency distributions with no real thought process like all other models such as Claude Opus etc.

115 comments

wg0

manmal 18 hours ago

I don’t think LLMs are that great at creating, however improved they have; I need to stay in the driver seat and really understand what’s happening. There’s not that much leverage in eliminating typing.

However, for reviewing, I want the most intelligent model I can get. I want it to really think the shit out of my changes.

I’ve just spent two weeks debugging what turned out to be a bad SQLite query plan (missing a reliable repro). Not one of the many agents, or GPT-Pro thought to check this. I guess SQL query planner issues are a hole in their reviewing training data. Maybe Mythos will check such things.

TheFirstNubian 17 hours ago
I’m a little conflicted on this, as I see a slippery slope here. LLMs in their current state (e.g., Opus-4.7) are really good in planning and one-shot codegen, which I believe is their primary use case. So they do provide enough leverage in that regard.
With this new workflow, however, we should, uncompromisingly, steer the entire code review process. The danger here, the “slippery slope,” is that we’re constantly craving for more intelligent models so we can somehow outsource the review to them as well. We may be subconsciously engineering ourselves into obsolescence.
- lazide 17 hours ago
  
  Subconsciously?!?
  
  3 replies →
- manmal 17 hours ago
  
  I feel the industry moving away from the automated slop machine, and back to conscious design. Is that only my filter bubble? Dex, dax, the CEO of sentry, Mario (pi.dev) - strong voices, all declaring the last half year a fever dream we must wake up from.
  
  1 reply →

rishabhaiover 17 hours ago

> just a random token generator based on token frequency distributions with no real thought process

I'm not smart enough to reduce LLMs and the entire ai effort into such simple terms but I am smart enough to see the emergence of a new kind of intelligence even when it threatens the very foundations of the industry that I work for.

wg0 17 hours ago
It's an illusion of intelligence. Just like when a non technical person saw the TV for the first time, he thought these people must be living inside that box.
He didn't know the 40,000 volt electron gun being bombarded on phosphorus constantly leaving the glow for few milliseconds till next pass.
He thought these guys live inside that wooden box there's no other explanation.
- PhunkyPhil 17 hours ago
  
  Right, but this electron box led to one of the largest (if not the largest) media revolution that has transformed the course of humanity in a frightening way we're still trying to grapple with.
  Still saying "LLMs are autocorrect" isn't wrong, but nobody is saying "phones are just electrons and silicon" to diminish their power and influence anymore.
  
  1 reply →
- Yajirobe 16 hours ago
  
  What happens when it's indistinguishable from a human speaker (in any conceivable test that makes sense)? It's like a philosophical zombie - imagine that you can't distinguish it from a human mind, there's no test you can make to say that it is NOT conscious/intelligent. So at some point, I think, it makes no sense to say that it's not intelligent.
  
  7 replies →
- nyc_data_geek1 17 hours ago
  
  Many people struggle to differentiate between illusion and reality, these days.
  There's a sucker born every minute, after all.
- root_axis 15 hours ago
  
  > It's an illusion of intelligence.
  A simulation, not an illusion. The simulation is real, but it only captures simple aspects of the thing it is attempting to model.
- devcpp 17 hours ago
  
  The lost jobs and the decrease in the demand for software engineers doesn't seem like an illusion. It might come back eventually but I wouldn't bet on it.
  
  1 reply →
- CamperBob2 16 hours ago
  
  I've had to adjust my priors about LLMs. Have you?
  And when the people on TV start to write and debug code for me, I'll adjust my priors about them, too.
teiferer 17 hours ago
> emergence of a new kind of intelligence
Curious about your definition of these terms.
Just because you are impressed by the capabilities of some tech (and rightfully so), doesn't mean it's intelligent.
First time I realized what recursion can do (like solving towers of hanoi in a few lines of code), I thought it was magic. But that doesn't make it "emergence of a new kind of intelligence".
- rishabhaiover 17 hours ago
  
  A recent one is the RCA of a hang during PostgreSQL installation because of an unimplemented syscall (I work at a lab that deals with secure OS and sandboxes). If the search of the RCA was left to me, I would have spent 2-3 weeks sifting through the shared memory implementation within PostgeSQL but it only took me a night with the help of Opus 4.5.
  To me, that's intelligence and a measurable direct benefit of the tool.
  
  6 replies →
- samdjstephens 16 hours ago
  
  > Curious about your definition of these terms.
  Likewise - I think sometimes we ascribe a mythical aura to the concept of “intelligence” because we don’t fully understand it. We should limit that aura to the concept of sentience, because if you can’t call something that can solve complex mathematical and programming problems (amongst many other things) intelligent, the word feels a bit useless.
  
  1 reply →
- mrandish 16 hours ago
  
  > definition of these terms
  To me, "intelligence" is a term that's largely useless due to being ill-defined for any given context or precision.
encrux 17 hours ago
Not really on topic anymore, but…
I keep wondering when this discussion comes up… If I take an apple and paint it like an orange, it’s clearly not an orange. But how much would I have to change the apple for people to accept that it’s an orange?
This discussion keeps coming up in all aspects of society, like (artificial) diamonds and other, more polarizing topics.
It’s weird and it’s a weird discussion to have, since everyone seems to choose their own thresholds arbitrarily.
- birdsink 16 hours ago
  
  I feel like these examples are all where human categorical thinking doesn’t quite map to the real world. Like the “is a hotdog a sandwich” question. “hotdog” and “sandwich” are concepts, like “intelligence”. Oftentimes we get so preoccupied with concepts that we forget that they’re all made-up structures that we put over the world, so they aren’t necessarily going to fit perfectly into place.
  I think it’s a waste of time to try and categorize AI as “intelligent” or “not intelligent” personally. We’re arguing over a label, but I think it’s more important to understand what it can and can’t do.
- rkagerer 17 hours ago
  
  Superficially? Looks like an orange, feels like an orange, tastes like an orange. Basically it passes something like the Turing test.
  Scientifically? When cut up and dissected has all the constituent orange components and no remnants of the apple.
throwatdem12311 16 hours ago

No you aren’t, clearly.

jadbox 18 hours ago

Deepseek v4, Qwen 3.6 Plus/Max, GLM 5+ are all pretty solid for most work.

sexy_seedbox 13 hours ago

Don't forget the Kimi 2.6 as well!

didip 19 hours ago

I agree. Data and userbase are still the moats.

Once a new model or a technique is invented, it’s just a matter of time until it becomes a free importable library.

aucisson_masque 15 hours ago

I went and tried to debug a script. Asked deepseek 4 pro and Claude the same prompt, they both took the exact same decisions, which led to the exact same issue and me telling them its still not working, with context, over a dozen time.

Over a dozen time they just gave both the same answer, not word for word, but the exact same reasoning.

The difference is that deepseek did on 1/40th of the price (api).

To be honest deepseek V4 pro is 75% off currently, but still were speaking of something like 3$ vs 20$.

bauerd 19 hours ago

Fully agree, I only pay the minimum for frontier models to get DeepSeek v4 output reviewed. I don't see this changing either because we have reached a level of good enough at this point.

KronisLV 18 hours ago

> Deepseek v4 is good enough, really really good given the price it is offered at.

Do they have monthly subscriptions, or are they restricted to paying just per token? It seems to be the latter for now: https://api-docs.deepseek.com/quick_start/pricing/

Really good prices admittedly, but having predictable subscriptions is nice too!

declan_roberts 18 hours ago
It's indeed the latter. Psychologically harder for me than a $20/mo sub but still a better value for the money. I'm finding myself spending closer to $40-$60 a month w/ openrouter without a forced token break.
Edit: it looks like it's 75% off right now which is really an incredible deal for such a high caliber frontier model.
- rkagerer 16 hours ago
  
  Neat, dumb question - are the tokens you prepay for good forever, or do they expire? And do they provide any assurances or SLA's about speed? (i.e. that in a year they won't decide to dole out response tokens to you at a snail's pace)
jackothy 17 hours ago

You can just input your $X per month/week/whatever yourself as API credits
vitaflo 16 hours ago
You make your own subscription. If you want to pay $20/month then put $20 into your account. When you use it up, wait till the next month (or buy more).
- KronisLV 15 hours ago
  
  > You make your own subscription.
  I'm asking because with most providers (most egregiously, with Anthropic) it doesn't work that way because the API pricing is way higher than any subscription and seemingly product/company oriented, whereas individual users can enjoy subsidized tokens in the form of the subscription. If DeepSeek only offers API pricing for everyone, I guess that makes sense and also is okay!
kibae 17 hours ago
[flagged]
- hsbauauvhabzb 16 hours ago
  
  This account is clearly astroturfing.
  
  1 reply →

kevin_thibedeau 19 hours ago

Can Deepseek answer probing questions about Winnie the Pooh?

mgol94 19 hours ago
What are you using LLMs for? To learn about world’s politics? Oh boy I have a news for you…
- rvba 18 hours ago
  
  One of the first things I did when openAI came out was asking it "which active politican is a spy?" - and it was blocked from the start.
  I asked early, at the time people were posting various jailbreaks, never worked.
  On a side note, any self hosted model I can get for my PC? I have 96 GB of RAM.
  
  2 replies →
kdheiwns 9 hours ago

I can't even make American AIs say no no words. All AIs are lobotomized drones.
djeastm 14 hours ago

Do you often find yourself asking your Chinese employees what they think about Winnie the Pooh?
harvey9 19 hours ago
Is it subject to CCP censorship? Maybe.
- windexh8er 19 hours ago
  
  It's fun to pretend the US models have no censorship constraints.
  
  7 replies →
petre 19 hours ago

Yeah, I specifically asked it about it. It seemed less censored than Gemini, back when it appeared and the latter was quite useless.
yieldcrv 18 hours ago

It understands everything in thinking mode and will break down its rule system in adhering to Chinese regulation
So if you or anyone passing by was curious, yes you can get accurate output about the Chinese head of state and political and critical messages of him, China and the party
Its final answer will not play along
If you want an unfiltered answer on that topic, just triage it to a western model, if you want unfiltered answers on Israel domestic and foreign policy, triage back to an eastern model. You know the rules for each system and so does an LLM

rotcev 19 hours ago

PS: Just to be clear - even the most expensive humans are unreliable, would make stupid mistakes, and their output MUST be reviewed carefully, so you’re not any different either. You’re just a random next-thought generator based on neuron firing distributions with no real thought process, trained on a few billion years of evolution like all other humans.

wg0 19 hours ago
Looks like you either have not worked with any human or with an LLM otherwise arriving at such a conclusion is damn impossible.
The humans I did work with were very very bright. No software developer in my career ever needed more than a paragraph of JIRA ticket for the problem statement and they figured out domains that were not even theirs to being with without making any mistakes and rather not only identifying edge cases but sometimes actually improving the domain processes by suggesting what is wasteful and what can be done differently.
- DrJokepu 18 hours ago
  
  I think you are very fortunate. I have worked with plenty of software developers like that, in fact, the overwhelming majority of them have been like that.
  
  1 reply →
- shakna 17 hours ago
  
  I have worked with people like this frequently. The ones you're always happy to see on the team.
  Also worked with people who were frustrated that they had to force push git to "save" their changes. Honestly, a token-box I can just ignore, would be an upgrade over this half of the team.
- vanviegen 18 hours ago
  
  I can't tell if you're joking..
- illuminator83 16 hours ago
  
  I and everybody else here call BS on that. People make mistakes all the time. Arguably at similar or worse rates.
- throw310822 18 hours ago
  
  > The humans I did work with [...] figured out domains that were not even theirs to being with without making any mistakes
  Seriously? I would like to remind you that every single mistake in history until the last couple of years has been made by humans.
- andoando 18 hours ago
  
  Uhh what, I speak to llms in broken english with minimal details and they figure it out better than I would have if you told me the same garbage
- fwipsy 18 hours ago
  
  Holy shit, you've never worked with anyone who made ANY mistakes? You must be one of those 10x devs I hear about. Wow, cool, please stay away from my team.
  
  1 reply →
intrinsicallee 18 hours ago
I'm still not sure what people declaring that they equate human cognition with large language models think they are contributing to the conversation when they do so.
Nevermind the fact that they are literally able to introspect human cognition and presumably find non verbal and non linear cognition modes.
- taneq 17 hours ago
  
  > Nevermind the fact that they are literally able to introspect human cognition and presumably find non verbal and non linear cognition modes.
  Are they, though? Or are they just predicting their own performance (and an explanation of that performance) on input the same way they predict their response to that input?
  Humans say a lot of biologically implausible things when asked why they did something.
  
  1 reply →
sumitkumar 5 hours ago

But once a human learns a function their errors are more predictable. And they can predict their own error before an operation and escalate or seek outside review/advice.
For e.g. ask any model "which class of problems and domains do you have a high error rate in?".
Pfhortune 19 hours ago
Humans can be held accountable. States have not yet shown the will to hold anyone accountable for LLM failures.
- mapontosevenths 18 hours ago
  
  They are tools. You hold the human using it accountable. If that means it's the executive who signed the PO, so be it.
  Until LLM's I'd never in my life heard someone suggest we lock up the compiler when it goofs up and kills someone, but now because the compiler speaks English we suddenly want to let people use it as a get out of jail free card when they use it to harm others.
  
  1 reply →
- vanviegen 18 hours ago
  
  You're free to hold an LLM accountable in the exact same way: fire it if you don't like its work.
  
  3 replies →
paodealho 18 hours ago
As fallible as they may be, I've never had a next-thought generator recommend me glue as a pizza ingredient.
- lanstin 18 hours ago
  
  No big brother or big sister?
- staz 18 hours ago
  
  You must not have kids
- taneq 17 hours ago
  
  Are you making the pizza for eating or for menu photography? I seem to recall glue being used in menu photography ‘food’ a lot.
mortenjorck 18 hours ago

Amusing and directionally correct, but as random next-thought generators connected to a conscious hypervisor with individual agency,* humanity still has a pretty major leg up on the competition.
*For some definitions of individual agency. Incompatiblists not included.
pyvpx 18 hours ago

Equating human thought to matrix multiplication is insulting to me, you, and humanity.
kokanee 18 hours ago

I hate that I agree with you. But there's a difference between whether AI is as powerful as some say, and whether it's good for humanity. A cursory review of human history shows that some revolutionary technologies make life as a human better (fire, writing, medicine) and others make it worse (weapons, drugs, processed foods). While we adapt to the commoditization of our skills, we should also be questioning whether the technologies being rolled out right now are going to do more harm than good, and we should be organizing around causes that optimize for quality of life as a human. If we don't push for that, then the only thing we're optimizing for is wealth consolidation.
hansmayer 18 hours ago

Errr... No. Please take this bullshit propaganda to a billionaires twitter feed.

dominotw 19 hours ago

dont they have the moat of being able to test their models on billions of ppl and gather feedback.

Rover222 17 hours ago

This is just starting to feel like desperation, making this claim that SOC LLMs are random token generators with absolutely no possibility of anything above that. Keep shouting into the wind though.

refulgentis 17 hours ago

"Deepseek v4 is good enough, really really good given the price it is offered at."

Kimi, MiMo, and GLM 5.1 all score higher and are cheaper.

They all came out before DeepSeek v4. I think you're pattern-matching on last year's discourse.

(I haven't seen other replies, yet, but I assume they explain the PS that amounts to "quality doesn't matter anyway": which still doesn't address the fact it's more expensive and worse.)

d--b 18 hours ago

We can't rule out a new innovation that makes frontier models more relevant than deepseek in 6 months. Things evolve so fast.

bandrami 18 hours ago
Equally you can't rule out innovation that makes deepseek more relevant than American models
- Art9681 18 hours ago
  
  We can because the reality is that America has led in AI since the beginning and has had the best frontier models. It's not like some other country held the top spot for any given period of time. No one in Europe or China. I'd give it the benefit of the doubt if there was precedent. But the only logical position to take is the lead is widening and while most AI's will go over some threshold where it is good enough for most people, the actual frontier will remain firmly in American soil.
  
  3 replies →

pagutierrezn 18 hours ago

>[LLMs are just] random token generator based on token frequency distributions with no real thought

... and who knows if we, humans, are not just merely that.

wonderwallaus 17 hours ago

What a crock of bs. A brain is "just" electrochemistry and a novel is "just" arrangements of letters. The question isn't the substrate, it's what structure emerges on top of it. Anthropic's own interpretability work has surfaced internal features that look like learned concepts, planning, and something resembling goal-directed reasoning. Calling the outputs random is wrong in a specific way, the distribution is extraordinarily structured.

AI will never.... Until it does.

hansmayer 3 hours ago

> internal features that look like learned concepts, planning, and something resembling goal-directed reasoning.
It's always so un-specific. Resembles this, seems that, almost such, danger that... A lot of magical thinking coming from AI-researchers who have hit the ceiling with a legacy technology that exists since 1940s and simply won't start reasoning on it's own, no matter how much GPUs they burn.
> Calling the outputs random is wrong in a specific way, the distribution is extraordinarily structured.
No, it's actually very correct in a very specific way. Ask any programmer using the parrots, and lately the "quality" has deteriorated so much, that coupled with the incoming price hikes, many will just forfeit the technology, unless someone else is carrying the cost, such as their employer. But as an employer, I also don't want to carry the costs for a technology which benefits as ever less.