← Back to context

Comment by sgarland

7 hours ago

Tangential, I bought a nearly identically-spec'd (didn't spring for the 8 TB SSD - in retrospect, had I kept it, I would've been OK with the 4 TB) model, and returned it yesterday due to thermal throttling. I have an M4 Pro w/ 48 GB RAM, and since the M5 Max was touted as being quite a bit faster for various local LLM usages, I decided I'd try it.

Turns out the heatsink in the 14" isn't nearly enough to handle the Max with all cores pegged. I'd get about 30 seconds of full power before frequency would drop like a rock.

I haven't really had a problem with thermal throttling, but my highest compute activity is inferencing. The main performance fall-off I've observed is that the cache/context size to token output rate curve is way more aggressive than I expected given the memory bandwidth compared to GPU-based inferencing I've done on PC. But other than spinning up the fans during prompt processing, I'm able to stay peak CPU usage without clock speed reducing. Generally though this only maintains peak compute utilization for around 2-3 minutes.

I'm wondering if there was something wrong with your particular unit?