Comment by hadlock

4 months ago

Consumer devices are already available that offer 128gb specifically labeled for AI use. I think server side AI will still exist for IoT devices, but I agree, 10 years seems pretty reasonable timelie to buy a GTX 5080-sized card that will have 1TB of memory, with the ability to pair it with another one for 2TB. For local, non-distributed use, GPUs are already more than capable of doing 20+ tokens/s, we're mostly waiting on 512gb devices to drop in price, and "free" LLMs to get better.

Are we constrained by RAM production?

RAM Price per GB Projected to decline at 15% per annum.

That's quite a few years before you'll get double the RAM.

For mobile I'm guessing power constraints matter too.

  • My guess was that nvidia is limiting memory size on consumer cards to avoid cannibalizing their commercial/industrial sales. I see no reason why a 5060 or 5070 can't come with 64/128/512gb memory outside of intentional decisions to not support those memory sizes; I don't need a 5090 as I don't need more than ~20-40 tokens/s as a 1-4 user household system