Comment by tarruda
8 days ago
Would love to see a Qwen 3.5 release in the range of 80-110B which would be perfect for 128GB devices. While Qwen3-Next is 80b, it unfortunately doesn't have a vision encoder.
8 days ago
Would love to see a Qwen 3.5 release in the range of 80-110B which would be perfect for 128GB devices. While Qwen3-Next is 80b, it unfortunately doesn't have a vision encoder.
Have you thought about getting a second 128GB device? Open weights models are rapidly increasing in size, unfortunately.
Considered getting a 512G mac studio, but I don't like Apple devices due to the closed software stack. I would never have gotten this Mac Studio if Strix Halo existed mid 2024.
For now I will just wait for AMD or Intel to release a x86 platform with 256G of unified memory, which would allow me to run larger models and stick to Linux as the inference platform.
I aspire to casually ponder whether I need a $9,500 computer to run the latest Qwen model
1 reply →
Given the shortage of wafers, the wait might be long. I am however working on a bridging solution. Sime already showed Strix Halo clustering, I am working on something similar but with some pp boost.
Unfortunately, AMD dumped a great device with unfinished software stack, and the community is rolling with it, compared to the DGX Spark, which I think is more cluster friendly.
Why 128GB?
At 80B, you could do 2 A6000s.
What device is 128gb?
AMD Strix Halo / Ryzen AI Max+ (in the Asus Flow Z13 13 inch "gaming" tablet as well as the Framework Desktop) has 128 GB of shared APU memory.
Not quite. They have 128GB of ram that can be allocated in the BIOS, up to 96GB to the GPU.
10 replies →
Keep in mind most of the Strix Halo machines are limited to 10Gbe networking at best.
2 replies →
Spark DGX and any A10 devices, strix halo with max memory config, several mac mini/mac studio configs, HP ZBook Ultra G1a, most servers
If you're targeting end user devices then a more reasonable target is 20GB VRAM since there are quite a lot of gpu/ram/APU combinations in that range. (orders of magnitude more than 128GB).
By A6000, do you mean the older Ampere generation model? 48 GB ddr6, released 2020 [1]. Can you even buy those new still?
[1] https://www.techpowerup.com/gpu-specs/rtx-a6000.c3686
That's the maximum you can get for $3k-$4k with ryzen max+ 395 and apple studio Ms. They're cheaper than dedicated GPUs by far.
Mac Studios or Strix Halo. GPT-OSS 120b, Qwen3-Next, Step 3.5-Flash all work great on a M1 Ultra.
All the GB10-based devices -- DGX Spark, Dell Pro Max, etc.
Guess, it is mac m series
maybe a deepseek v4 distill. give it a few days