Comment by hadlock
15 hours ago
Consumer devices are already available that offer 128gb specifically labeled for AI use. I think server side AI will still exist for IoT devices, but I agree, 10 years seems pretty reasonable timelie to buy a GTX 5080-sized card that will have 1TB of memory, with the ability to pair it with another one for 2TB. For local, non-distributed use, GPUs are already more than capable of doing 20+ tokens/s, we're mostly waiting on 512gb devices to drop in price, and "free" LLMs to get better.
Are we constrained by RAM production?
RAM Price per GB Projected to decline at 15% per annum.
That's quite a few years before you'll get double the RAM.
For mobile I'm guessing power constraints matter too.