Comment by thot_experiment
2 hours ago
I just build llama.cpp from scratch on the PR that has MTP drafters.
https://github.com/ggml-org/llama.cpp/pull/23398
Please don't use Ollama, it's a bad actor in the OSS community.
2 hours ago
I just build llama.cpp from scratch on the PR that has MTP drafters.
https://github.com/ggml-org/llama.cpp/pull/23398
Please don't use Ollama, it's a bad actor in the OSS community.
I don't have the energy to build stuff all the time, that's a rabbit-hole side tunnel I don't really want to get into. I have larger concerns in my life that are more urgent than developing that side of things.
But I've moved on from Ollama for the time being, though I am mainly interested to see what the Gemma 4 MTP speeds are like on my M1 Max, so I may test it.
I am quite impressed with the tools in LM Studio, which is also a beautiful app, but it is not open source (which challenges my personal strategy somewhat) and I dread its inevitable enshittification.
Nevertheless the GUI has been very helpful while I learn, and I will probably use it until something else presents or my usage pattern settles down from experimentation to something a bit more routine.
I will try oMLX, too, but judging by the LiteRT page I may soon be able to just use that for the larger models if I end up settling with Gemma 4.
Totally understandable. YMMV but I found the llama.cpp build process to work on the first try on my machine, and it only takes a couple minutes, which definitely isn't my usual expectation or experience. I was very pleasantly surprised. Their web-ui is also getting very polished while still doing a great job of letting you tweak all the weird settings.
Sorry, I sounded a bit terse there!
You have probably convinced me to give it a try, to be honest.
It's just that, to cut a long story short, I am currently recovering from a level of burnout so severe that twelve months ago had me fully convinced I was actually in early-onset cognitive decline (I am a bit over fifty).
Only a little over two months ago I was still sure I'd have to quit IT and find a slow job because I was so out of the loop; this whole industry shift even in just the last few months is so shocking and strange.
So I have to be a bit cautious about how many indirections I add, if that makes sense. But I am compiling bigger projects than llama.cpp so I will give it a go.
Thank you for the extra detail.