Comment by rhgraysonii

1 day ago

I have been having pretty good success with Qwen 3.5 9B for "nontrivial but not challenging work all things considered" -- it runs great on my 24gb unified memory m4 pro MacBook Pro. What do the baseline specs look like Mac-wise for getting this model to run? Am I looking at a 96gb? 128? 256?

You might be interested in Ornith 1.0 9B, which is a new intriguing post-training of Qwen 3.5 9B.

Qwen 3.6 27B will run in full offload with a 4-bit quantisation in 64GB on an M1 Max. It is quite slow.

I don't know about 48GB but 64GB should be enough.

  • I've been trying Ornith 1.0 35B, I'm pretty impressed with it: https://simonwillison.net/2026/Jun/29/ornith/

    • It's the one I have loaded right now.

      It got rather tangled up when I tried it with one of my coding tests, which is a simple wordpress plugin, but I frustrate the model by asking it to write code for older PHP, break WP coding conventions and use a rather bespoke method for arranging code in objects. So it is sort of a hybrid of a green field and brown field task; a bit muddy.

      It did not do as well as Qwen 3.6 35B, but the way it worked through its thoughts was interesting.

      TBH I struggled to understand what DeepReinforce are doing that is materially different; the explanation of their training technique goes over my head at this point.

    • It is also available with Ollama now and I am equally impressed too.

  • Thanks! I was thinking of doing the 128gb to have some future proofing. I figure at this point, it's akin to a mechanic keeping great tools around, when it comes to having this sort of homelab and exposing it for your own uses. And great practice for building the next era of user facing computing that will be around as this proliferates.

    • I would not buy a 64GB model again, probably, if this were to remain particularly important to me. But I gather memory bandwidth is pretty important here.

      So for example I'd favour a used M1 Max over a used M2 Pro, at least based on my naïve understanding. Not quite sure where the balance changes.

      There appear to be some hardware improvements with the M3 and up regarding the Apple Neural Engine which I'd hope would show up in MLX performance; I remember seeing some optimisations in image generation models that are only possible on later hardware.

      The GPU cores are progressively better I believe, but the memory bandwidth is lower. Though perhaps the M4 can get closer to actually saturating said bandwidth.

      (And I must reiterate that my understanding of this stuff is pretty naïve.)

      2 replies →