Comment by butILoveLife
9 hours ago
>Time to first token measured with an 8K-token prompt using a 14-billion parameter model with 4-bit quantization
Oh dear 14B and 4-bit quant? There are going to be a lot of embarrassed programmers who need to explain to their engineering managers why their Macbook can't reasonably run LLMs like they said it could. (This already happened at my fortune 20 company lol)
Yeah no it didn’t. If you have a fully speced out M3/4 MacBook with enough memory you’re running pretty decent models locally already. But no one is using local models anyway.
I run a local model on the daily. I have it making tickets when certain emails come in and made a small that I can click to approve ticket creation. It follows my instructions and has a nice chain of thought process trained. Local LLMs are starting to become very useful. Not OpenClaw crap.
> Yeah no it didn’t
What is "it" and what didn't it it do?
With OpenClaw and powerful local models like Kimi 2.5, these specs make a lot of sense.
K2.5 isn't remotely a local model
I wonder if Apple has foresight into locally running LLMs becoming sufficiently useful.
It won’t handle serious tasks but I have Gemma 3 installed on my M2 Mac and it is good for most of my needs—-esp data I don’t want a corporation getting its hands on.
What kind of tasks are you using it for? I haven't really found any uses for small models.
They do! "You're holding it wrong*