← Back to context

Comment by 0xbadcafebee

13 hours ago

I don't see an explanation of why they would make a model-specific inference engine vs just using llamacpp. There are already lots of people working on the llamacpp integration. This is a lot of effort spent on a single model which is likely to become obsolete when a different model comes out that does better. In some discussions, people are now making PRs against both the llamacpp branches and ds4... so it's taking a rare commodity (people investing development time in this model) and fragmenting it

way easier to work on a focussed c codebase you own than a mature unwieldy c++ codebase you don't. but it's fine, people will take that work and port to llamacpp and everyone wins.

(the ux of ds4 is fantastic too -- it's dead-easy to get a known-good model, great quant. llamacpp you're much more hacking in the wilderness, w/ many many knobs.)

I believe the assumption is: The code is cheap. The collaboration (eg. upstreaming) is expensive.

Is it true? We'll see, in a few years.

Author has mentioned many times that the llama.cpp maintainers don't want code that's prevalently written by AI with no human revision. If anyone wants to try and get the support upstreamed into that project, they're quite free to do that: the code is MIT licensed.

  • Also Antirez has been able to use GPT to iterate on the code and performance. He/they (others contributed to DS4) has a set of result files to ensure that correctness is maintained, and benchmarks to verify performance, and the LLM is able to iterate within that framework. Having a small, focussed codebase helps here.

    Antirez explained the dev process when he posted a pure C implementation of the Flux 2 Klein image gen model, at https://news.ycombinator.com/item?id=46670279

At a certain point the level of abstraction / genericization necessary for a big flexible project (like llama.cpp or Linux) blows things up into a huge number of files. Something newer and smaller can move faster.