Comment by butILoveLife

9 hours ago

>Time to first token measured with an 8K-token prompt using a 14-billion parameter model with 4-bit quantization

Oh dear 14B and 4-bit quant? There are going to be a lot of embarrassed programmers who need to explain to their engineering managers why their Macbook can't reasonably run LLMs like they said it could. (This already happened at my fortune 20 company lol)

9 comments

butILoveLife

bbshfishe 2 hours ago

Yeah no it didn’t. If you have a fully speced out M3/4 MacBook with enough memory you’re running pretty decent models locally already. But no one is using local models anyway.

razster 21 minutes ago

I run a local model on the daily. I have it making tickets when certain emails come in and made a small that I can click to approve ticket creation. It follows my instructions and has a nice chain of thought process trained. Local LLMs are starting to become very useful. Not OpenClaw crap.
weird-eye-issue 2 hours ago

> Yeah no it didn’t
What is "it" and what didn't it it do?
jordhy 1 hour ago
With OpenClaw and powerful local models like Kimi 2.5, these specs make a lot of sense.
- jbellis 28 minutes ago
  
  K2.5 isn't remotely a local model

knicholes 6 hours ago

I wonder if Apple has foresight into locally running LLMs becoming sufficiently useful.

DiscourseFan 3 hours ago
It won’t handle serious tasks but I have Gemma 3 installed on my M2 Mac and it is good for most of my needs—-esp data I don’t want a corporation getting its hands on.
- andai 2 hours ago
  
  What kind of tasks are you using it for? I haven't really found any uses for small models.
b112 3 hours ago

They do! "You're holding it wrong*