Comment by ajb
20 hours ago
The thing that is supposed to happen next is high-bandwidth flash. In theory, it could allow laptops to run the larger models without being extortionately costly, by loading directly from flash into the GPU (not by executing in flash) But I haven't seen figures of the actual bandwidth yet, and no doubt to start with it will be expensive. The underlying technology of flash has much higher read latency than dram, so it's not really clear (to me, at least) if they can deliver the speeds needed to remove the need to cache in VRAM just by increasing parallelism.
No comments yet
Contribute on Hacker News ↗