Comment by gcr

12 hours ago

DwarfStar4 is a small LLM inference runtime that can run DeepSeek 4. The blog post implies that it currently requires 96GB of VRAM.

For others who are lacking context :-)

27 comments

gcr

foresto 12 hours ago

Thanks. Outside of LLM circles, DS4 is usually a video game controller.

artyom 12 hours ago
Well, I was sitting here expecting the Redis creator have an opinion on still-unannounced Dark Souls 4.
- low_tech_love 7 hours ago
  
  Haha the same here!!
oezi 10 hours ago
Or a car from Citroen
- pavlov 7 hours ago
  
  Technically DS is an independent sibling of Citroën within Stellantis, a sprawling car conglomerate that owns a dog’s dinner of car brands in Europe and USA.
  
  4 replies →
insensible 10 hours ago
Trekkies are experiencing a major regression from Deep Space Nine.
- kjs3 6 minutes ago
  
  There were prototypes. The Cardassians never get it right the first (eight) times.
- RALaBarge 4 hours ago
  
  They never should have trusted Qwark
jofzar 12 hours ago

I am actually kind of disappointed it wasn't a deep dive on the dual shock 4

smcleod 5 hours ago

That's the flash version not the full model and only at Q2-3~ so while impressive it's still quite different from the full model.

rurban 4 hours ago

Not really. I'm building now another fast C compiler with DeepSeek 4 Flash, and rarely have to step outside to use Pro or Sonnet, gpt or kimi-2.6. Flash is very capable of almost everything.

zozbot234 7 hours ago

> The blog post implies that it currently requires 96GB of VRAM.

Has anyone tested what happens if you try and run this on lower-RAM Macs? It might work and just be a bit slower as it falls back on fetching model layers from storage.

conradkay 7 hours ago
It'd be way slower since you'd be doing that work every token
- zozbot234 7 hours ago
  
  True (with 64GB RAM it'd have to fetch 20% of its active experts from disk already, about 650MB/tok at 2-bit quant - and that percentage rises quickly as you lower RAM further); my question is just a more practical one about whether it runs at all, how bad the slowdown is, and to what extent you might be able to get some of that decode throughput back by running multiple (slower) agent sessions in parallel under a single Dwarf Star 4 server.

Wowfunhappy 4 hours ago

Thanks. How is DwarfStar4 different from llama.cpp?

rpigab 6 hours ago

I knew Death Stranding 3 wasn't out yet!

DeathArrow 10 hours ago

>The blog post implies that it currently requires 96GB of VRAM.

From the Github page it seems it only supports Apple and DGX Spark. I have 128 GB of RAM and a 3090 but it probably won't work.

thomasm6m6 9 hours ago
FYI, llama.cpp (which antirez/ds4 is inspired by) supports system ram. E.g. [1] is a good guide for running a similar-sized model with 128gb ram and a 3090-sized GPU.
[1] https://unsloth.ai/docs/models/tutorials/minimax-m27
(Unsloth's deepseek-v4 support is still WIP)
- DeathArrow 8 hours ago
  
  Thanks, I can run Qwen 3.6 27B with vllm, but I was curious about antirez tool.
  
  1 reply →
manmal 8 hours ago
It wouldn’t be useful with your setup, probably 3-4 token per second.
- DeathArrow 7 hours ago
  
  Yep, maybe I can open a feature request if it makes sense technically.
  
  1 reply →