Comment by HeWhoLurksLate

1 day ago

but what about second GPU?

Multiple GPUs was tried, by the whole industry including Apple (most notably with the trash can Mac Pro). Despite significant investment, it was ultimately a failure for consumer workloads like gaming, and was relegated to the datacenter and some very high-end workstations depending on the workload.

Multi-GPU has recently experienced a resurgence due to the discovery of new workloads with broader appeal (LLMs), but that's too new to have significantly influenced hardware architectures, and LLM inference isn't the most natural thing to scale across many GPUs. Everybody's still competing with more or less the architectures they had on hand when LLMs arrived, with new low-precision matrix math units squeezed in wherever room can be made. It's not at all clear yet what the long-term outcome will be in terms of the balance between local vs cloud compute for inference, whether there will be any local training/fine-tuning at all, and which use cases are ultimately profitable in the long run. All of that influences whether it would be worthwhile for Apple to abandon their current client-first architecture that standardizes on a single integrated GPU and omits/rejects the complexity of multi-GPU setups.