Comment by seanmcdirmid

2 months ago

I’ve been running LLMs on my laptop (M3 Max 64GB) for a year now and I think they are ready, especially with how good mid sized models are getting. I’m pretty sure unified memory and energy efficient GPUs will be more than just a thing on Apple laptops in the next few years.

13 comments

seanmcdirmid

noman-land 2 months ago

You doing code completion and agentic stuff successfully with local models? Got any tips? I've been out of the game for [checks watch] a few months and am behind on the latest. Is Cline the move?

seanmcdirmid 2 months ago
I haven't bothered doing code completion locally yet, though its something I want to try with the QWEN model. I'm mostly using it to generate/fix code CLI style.
- noman-land 2 months ago
  
  I had some pretty decent but very non-state-of-the-art success with it even cobbled together with LM Studio and VSCode plugins. I'm excited to keep trying it over the next months and years.

allovertheworld 2 months ago

Only because of Apples unified memory architecture. The groundwork is there, we just need memory to be cheaper so we can fit 512+GB now ;)

seanmcdirmid 2 months ago
Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition). I meant, I expect the other mobile chip providers to adopt unified architecture and beefy GPU cores on chip and lots of bandwidth to connect it to memory (at the max or ultra level, at least), I think AMD is already doing UM at least?
- spwa4 2 months ago
  
  > Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition)
  Don't worry! Sam Altman is on it. Making sure there never is healthy competition that is.
  https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram...
  
  6 replies →
zmmmmm 2 months ago

There's not in the end all that much point having more memory than you can compute on in a reasonable time. So I think probably the useful amount tops out in the 128GB range where you can still run a 70b model and get a useful token rate out of it.