Comment by albertzeyer

3 hours ago

More information about DwarfStar 4 (DS4) in the readme: https://github.com/antirez/ds4

The code seems based on llama.cpp and GGML.

I don't fully understand why it is a standalone project. The readme discusses this: DwarfStar 4 is a small native inference engine specific for DeepSeek V4 Flash. It is intentionally narrow: ...

I think the only bigger difference in DeepSeek V4 vs other models is maybe the type of self-attention. And that leads to: KV cache is actually a first-class disk citizen.

But I still feel like those changes could have been implemented as part of some of the other local engines.

I also assume more models will come out, not just from DeepSeek but also from others, and they might share similar self-attention approaches, that would benefit from a similar KV cache implementation.

2 comments

albertzeyer

skiwithuge 3 hours ago

because llama.cpp doesn't accept fully pr made by ai agents even if they are guided by the author

https://github.com/ggml-org/llama.cpp/blob/master/AGENTS.md

embedding-shape 3 hours ago

Which makes sense, the amount of PRs llama.cpp receives from authors who have no clue what they're doing and can't even answer simple questions about "what they did" is staggering, must be very exhausting to have to figure out "is it worth replying to this author?" for every single PR.