Comment by tarruda

8 days ago

Would love to see a Qwen 3.5 release in the range of 80-110B which would be perfect for 128GB devices. While Qwen3-Next is 80b, it unfortunately doesn't have a vision encoder.

Have you thought about getting a second 128GB device? Open weights models are rapidly increasing in size, unfortunately.

  • Considered getting a 512G mac studio, but I don't like Apple devices due to the closed software stack. I would never have gotten this Mac Studio if Strix Halo existed mid 2024.

    For now I will just wait for AMD or Intel to release a x86 platform with 256G of unified memory, which would allow me to run larger models and stick to Linux as the inference platform.

    • Given the shortage of wafers, the wait might be long. I am however working on a bridging solution. Sime already showed Strix Halo clustering, I am working on something similar but with some pp boost.

      Unfortunately, AMD dumped a great device with unfinished software stack, and the community is rolling with it, compared to the DGX Spark, which I think is more cluster friendly.

Why 128GB?

At 80B, you could do 2 A6000s.

What device is 128gb?