Comment by fgonzag

2 days ago

Apple's M5 Max will probably be able to run it decently (as it will fix the biggest issue with the current lineup, prompt processing, in addition to a bandwidth bump).

That should easily run an 8 bit (~360GB) quant of the model. It's probably going to be the first actually portable machine that can run it. Strix Halo does not come with enough memory (or bandwidth) to run it (would need almost 180GB for weights + context even at 4 bits), and they don't have any laptops available with the top end (max 395+) chips, only mini PCs and a tablet.

Right now you only get the performance you want out of a multi GPU setup.