Comment by tarruda

8 days ago

Would love to see a Qwen 3.5 release in the range of 80-110B which would be perfect for 128GB devices. While Qwen3-Next is 80b, it unfortunately doesn't have a vision encoder.

28 comments

tarruda

Tepix 8 days ago

Have you thought about getting a second 128GB device? Open weights models are rapidly increasing in size, unfortunately.

tarruda 8 days ago
Considered getting a 512G mac studio, but I don't like Apple devices due to the closed software stack. I would never have gotten this Mac Studio if Strix Halo existed mid 2024.
For now I will just wait for AMD or Intel to release a x86 platform with 256G of unified memory, which would allow me to run larger models and stick to Linux as the inference platform.
- kylehotchkiss 8 days ago
  
  I aspire to casually ponder whether I need a $9,500 computer to run the latest Qwen model
  
  1 reply →
- 3abiton 6 days ago
  
  Given the shortage of wafers, the wait might be long. I am however working on a bridging solution. Sime already showed Strix Halo clustering, I am working on something similar but with some pp boost.
  Unfortunately, AMD dumped a great device with unfinished software stack, and the community is rolling with it, compared to the DGX Spark, which I think is more cluster friendly.

PlatoIsADisease 8 days ago

Why 128GB?

At 80B, you could do 2 A6000s.

What device is 128gb?

the_pwner224 8 days ago
AMD Strix Halo / Ryzen AI Max+ (in the Asus Flow Z13 13 inch "gaming" tablet as well as the Framework Desktop) has 128 GB of shared APU memory.
- scoopdewoop 8 days ago
  
  Not quite. They have 128GB of ram that can be allocated in the BIOS, up to 96GB to the GPU.
  
  10 replies →
- hedgehog 8 days ago
  
  Keep in mind most of the Strix Halo machines are limited to 10Gbe networking at best.
  
  2 replies →
tgtweak 8 days ago

Spark DGX and any A10 devices, strix halo with max memory config, several mac mini/mac studio configs, HP ZBook Ultra G1a, most servers
If you're targeting end user devices then a more reasonable target is 20GB VRAM since there are quite a lot of gpu/ram/APU combinations in that range. (orders of magnitude more than 128GB).
kristianp 8 days ago

By A6000, do you mean the older Ampere generation model? 48 GB ddr6, released 2020 [1]. Can you even buy those new still?
[1] https://www.techpowerup.com/gpu-specs/rtx-a6000.c3686
lm28469 8 days ago

That's the maximum you can get for $3k-$4k with ryzen max+ 395 and apple studio Ms. They're cheaper than dedicated GPUs by far.
tarruda 8 days ago

Mac Studios or Strix Halo. GPT-OSS 120b, Qwen3-Next, Step 3.5-Flash all work great on a M1 Ultra.
sowbug 8 days ago

All the GB10-based devices -- DGX Spark, Dell Pro Max, etc.
vladovskiy 8 days ago

Guess, it is mac m series

bytesandbits 7 days ago

maybe a deepseek v4 distill. give it a few days