Comment by hadlock
4 months ago
Consumer devices are already available that offer 128gb specifically labeled for AI use. I think server side AI will still exist for IoT devices, but I agree, 10 years seems pretty reasonable timelie to buy a GTX 5080-sized card that will have 1TB of memory, with the ability to pair it with another one for 2TB. For local, non-distributed use, GPUs are already more than capable of doing 20+ tokens/s, we're mostly waiting on 512gb devices to drop in price, and "free" LLMs to get better.
Are we constrained by RAM production?
RAM Price per GB Projected to decline at 15% per annum.
That's quite a few years before you'll get double the RAM.
For mobile I'm guessing power constraints matter too.
My guess was that nvidia is limiting memory size on consumer cards to avoid cannibalizing their commercial/industrial sales. I see no reason why a 5060 or 5070 can't come with 64/128/512gb memory outside of intentional decisions to not support those memory sizes; I don't need a 5090 as I don't need more than ~20-40 tokens/s as a 1-4 user household system