← Back to context

Comment by Insanity

1 day ago

But you have to factor in that this device will last you 5-10 years. That said, I wouldn't spend almost $7k USD on this macbook lol.

Memory requirements of newer models will increase, so while the hardware may last 10 years it won't be able to run the latest models for 10 years.

  • My experience working in the open model space pretty deeply (both LLMs and diffusion models) for years now is that it is not quite as simple as that.

    In the open model space an insane amount of effort goes into getting more powerful models to run with the same or less RAM. For example in the diffusion world many things that could not be run on easily under 24GB of VRAM actually run much better today with much less VRAM than they did a few years ago. You can do many things today with 8-16GB of VRAM that would not have been possible. At the same time the most advanced open models, like LTX 2.3 for video gen, still seem to respect 24GB of VRAM as the upper bound.

    Similarly the standard "big" but localish open model for LLMs back in the day was Llama 3 70B, this was both a much worse and much larger model than Qwen 3.6 27B

    So in two different spaces I've witnessed the "RAM required to run the best" decreasing or at least remaining stable, while the performance being achieved in both areas is astounding (LTX 2.3 is faster, better and more capable than the Wan 2.2 model that held popularity before it).

    The biggest thing to watch out for is not just RAM/VRAM but memory bandwidth. You can try to "future proof" yourself with lots of RAM, but if it's 400 GB/S you're still constrained to smaller models.

    • > The biggest thing to watch out for is not just RAM/VRAM but memory bandwidth. You can try to "future proof" yourself with lots of RAM, but if it's 400 GB/S you're still constrained to smaller models.

      I'm thinking of getting a SoC machine with 128GB RAM but the bandwidth is limited to 256 GBps. Would you even consider such a machine a decent investment, or should I wait for the newer gen of chips? Thanks!

      2 replies →

    • > insane amount of effort goes into getting more powerful models to run with the same or less RAM

      The same can be said about operating system memory requirements. I am sure Linux and Windows kernel developers can confirm. Yet 30 years ago Solaris used to run comfortably in 16 MB of RAM, today you need 512 times that to run Linux.

  • Nah. There are already models at every size on the scale. If you want to run an open 1T model today, you can.

    What's going to happen is that the capability at any given size point is going to get better over time as new training regimes cram more into the available space. A 27b model released next year will be better than a 27b model this year (else why release it?). Hardware will get more useful, not less.

  • You raise a fair point, but I'm not convinced it'll offer a meaningful difference in performance as long as we're stuck with the current AI paradigm.

  • Will they? Or will we find ways to optimize models and need less? Only time will tell.

  • It can't run the latest models today - GLM-5.2 class models already need 1TB+ of RAM.

    ... but, the models that WILL run on 128GB (or 64GB or even 32GB) models today are a huge improvement on the best models that would run in the same amount of memory six months ago.

    •     > GLM-5.2 class models already need 1TB+ of RAM.
      

      If you quantize GLM-5.2 to 4 bit, you can do it in less than 500GB: https://huggingface.co/unsloth/GLM-5.2-GGUF (table on the right)

      If you find three finds that also have a 128GB MacBook, you can chain them together (the MacBooks, not your friends) and make it work.

      You could also run GLM-5.2 on a single MacBook if you stream the active parameters from disk, but even with speculative decoding, you'd probably only get in the order of 1 token per second, so this is not really practical for most applications.

  • Available models aren’t really trending upward in size. Not like I thought they would, anyway.

    They’re trending to be the right size to be good.

    Qwen3.6-35B is not as good as Qwen3.6-27B. The larger model is faster, but a lot dumber; it gets caught in loops, makes crazy mistakes, and is just not as good. It’s bigger, but it is nowhere near as good as the 27B variant.

    • Qwen3.6-35B-A3B is worse than 27B because it's an MoE and 27B is dense. 35B only passes each token through 3B of its total parameters, whereas 27B sends each token through all 27B parameters.

  • I think you have too much faith in context AGI.

    at 128GB, you can find almost it's entire context for Qwen3.6 35B MoE.

    Again, I think you have too much faith in extrapolation. It's like you got a baby at 0 months, then measured it at 12 months and expect it to be a giant.

In 5-10 years, incremental cloud tokens will be far cheaper (likely but not guaranteed).