Comment by hei-lima

9 hours ago

We need another "Deepseek moment" or else it will become impossible for the regular dude to use AI. It will become something that only big companies can afford.

40 comments

hei-lima

SwellJoe 8 hours ago

We're having DeepSeek moments every couple of weeks.

Qwen 3.6 hit hard in the self-hosting space. It's incredibly capable for its size, really shaking up what's possible in 64GB or even 32GB of VRAM.

The Prism Bonsai ternary model crams a tremendous amount of capability into 1.75GB.

And, DeepSeek V4 is crazy good for the price. They're charging flash model prices for their top-tier Pro model, which is competitive with the frontier of a few months ago.

The winners in the AI war will be the companies that figure out how to run them efficiently, not the ones that eke out a couple percent better performance on a benchmark while spending ten times as much on inference (though the capability has to be there, I think we're seeing that capability alone isn't a strong moat...there's enough competent competition to insure there's always at least a few options even at the very frontier of capability).

Zambyte 8 hours ago
> It's incredibly capable for its size, really shaking up what's possible in 64GB or even 32GB of VRAM.
You can lower that to at least 24GB. I've been running Qwen 3.5 and 3.6 with codex on a 7900 XTX and the long horizon tasks it can handle successfully has been blowing my mind. I would seriously choose running my current local setup over (the SOTA models + ecosystem) of a year ago just based on how productive I can be.
- hei-lima 4 hours ago
  
  Gonna try it.
trollbridge 8 hours ago
We have Qwen 3.6-35b (6) on a 5090 (32GB) and it's blowing me away. Works fine for most (not all) code generation tasks. One developer here has been extremely stubborn about adopting AI; he's finally adopted it, albeit only when it's coming from a local model like this.
DeepSeek V4 Pro likewise is insanely good for the price. I simply point it at large codebases, go get a cup of coffee or browse Hacker News, and then it's done useful work. This was simply not possible with other models without hitting budget problems.
- akulbe 6 hours ago
  
  Any chance you'd be willing to talk further about your setup? I have 2 x 3090s in a local machine, and I'm still left with questions about how best to use stuff locally.
  
  4 replies →

squidbeak 9 hours ago

Deepseek had another moment a few weeks ago. V4 isn't far behind the US frontier, and so far its flash variant seems a very reliable coder and costs a pittance.

ai_fry_ur_brain 9 hours ago
Deepseek V4 (not flash) trippled in price too by the way (from Deepseek). Get used to this pattern.
This is what you get for relying on the generosity of billionaires. Keep offshoring your thinking ability to a machine and let me know how competitive you. Hint, you wont be. There's nothing special about being able to use an LLM.
- npn 9 hours ago
  
  Unlike other providers, Deepseek does promise that they will lower the price when their Huawei cards arrive in a few more months.
  
  2 replies →
- ls612 9 hours ago
  
  Anyone can host Deepseek V4 on rented GPUs and sell inference on it. Price will very quickly converge to the marginal cost of inference. This is as close to a pure commodity as it gets in the AI space so competitive market economics will put in work. Same is true for any open-weights model.
  
  4 replies →
- dpoloncsak 9 hours ago
  
  Mate why are you so mad at people upset the price trippeled? It's a fair complaint that people built services using the cheaper ones with the expectation future models would be similarly priced. You can avoid 'offloading thinking' while still building ontop of these models
- zaptrem 7 hours ago
  
  V4-Pro is about 2.4× total params and 1.3× active params of V3.2.
- creationcomplex 6 hours ago
  
  You're typing as your handwriting and letter sending abilities deteriorate to dust. Writing down information as your memory capacity decays. Remembering instead of living at the pure leading edge of perception dulling your reactions.
  Smh, it's all downhill from the first unadulterated neuron.
- aurareturn 9 hours ago
  
  I think demand is too great and compute is not enough. Nothing to do with billionaires colluding to increase prices by 3x.
  
  1 reply →

xbmcuser 8 hours ago

What we need is a deepseek moment in hardware ie China reaching parity on node size that is the only way latest computers let alone latest ai will be available to us in the future otherwise the profit margins will push most production to AI.

throwa356262 8 hours ago
To be honest, China not having access to the latest hardware is exactly what has driven LLM technology forward the last 2 years.
- humanfromearth9 8 hours ago
  
  Why?
  
  2 replies →
blackoil 1 hour ago

Open Source ASML EUV. But will wipe off trillions from US stocks so 401k may not like that.

stared 6 hours ago

We have a "DeepSeek moment", https://huggingface.co/bartowski/Qwen_Qwen3.6-35B-A3B-GGUF

segmondy 9 hours ago

You can use lots of open weight models today.

hei-lima 9 hours ago

That's one solution to the problem. But it still needs some good computational capabilities. Either we optimize the hell out of those models, or we wait for the hardware to become good enough for them.
Gigachad 6 hours ago

The real problem is the hardware to run them is still very expensive.

pianopatrick 8 hours ago

Maybe we can figure out better ways to use the models that can run on cheap hardware.

GeorgeOldfield 9 hours ago

gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh

k8sToGo 8 hours ago
Are you really comparing flash to opus? Shouldn't you be comparing pro?
- CognitiveLens 8 hours ago
  
  The benchmark tables in the Google announcement include Opus 4.7, and the numbers are very impressive. Caveat emptor, but it's not unreasonable to compare a new Flash to a current-gen Opus, even if some of the results confirm expectations
bachmeier 8 hours ago

Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.
kmac_ 8 hours ago

Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.