Comment by kilroy123

9 hours ago

I hope to see something like this, but in a small form factor like the NVIDIA spark.

I want a super fast LLM that is Opus 4.6+, like, in ability.

7 comments

kilroy123

Memory bandwidth is the bottleneck in the Spark. If you replace the SoC with an optimized ASIC but keep the same 256-bit LPDDR5 the performance will be the same. You can increase performance by using wider memory but that's also more expensive.

phonon 6 hours ago
M3 Ultra has a 1024 bit memory bus (819 GB/s) and starts at $3,999 (96GB of RAM). It can be done....
- bigyabai 6 hours ago
  
  The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.
  For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.
  
  1 reply →

smith7018 7 hours ago

Unfortunately Sam Altman won't be the one to deliver us at-home hardware that can run Opus-level models

blitzar 4 hours ago

I wonder what is happening with the OpenAI / Jony Ive crossover episode.

flyinglizard 6 hours ago

Forget about it. Datacenter class hardware is getting farther and farther from desktop use. It’s not PCIe GPUs anymore.