Comment by aurareturn

1 day ago

Yes, it's better on the Spark but the M5 is a lot closer than before with neural accelrators. After prompt processing, token generation speed on the M5 Max is 2.3x faster.

No Apple markup but you get the Nvidia market up instead. Prior to the recent Apple price increase due to RAM shortage, an M5 Max 128GB was a bargain if you want to run local LLMs.

I can get 2.5 spark for the price of the M5, will have better throughput and access to bigger models (more vram when running tensor parallel)