Comment by wmf

9 hours ago

Memory bandwidth is the bottleneck in the Spark. If you replace the SoC with an optimized ASIC but keep the same 256-bit LPDDR5 the performance will be the same. You can increase performance by using wider memory but that's also more expensive.

M3 Ultra has a 1024 bit memory bus (819 GB/s) and starts at $3,999 (96GB of RAM). It can be done....

  • The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.

    For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.