Comment by albertzeyer

3 hours ago

More information about DwarfStar 4 (DS4) in the readme: https://github.com/antirez/ds4

The code seems based on llama.cpp and GGML.

I don't fully understand why it is a standalone project. The readme discusses this: DwarfStar 4 is a small native inference engine specific for DeepSeek V4 Flash. It is intentionally narrow: ...

I think the only bigger difference in DeepSeek V4 vs other models is maybe the type of self-attention. And that leads to: KV cache is actually a first-class disk citizen.

But I still feel like those changes could have been implemented as part of some of the other local engines.

I also assume more models will come out, not just from DeepSeek but also from others, and they might share similar self-attention approaches, that would benefit from a similar KV cache implementation.