← Back to context

Comment by ajb

18 hours ago

The thing that is supposed to happen next is high-bandwidth flash. In theory, it could allow laptops to run the larger models without being extortionately costly, by loading directly from flash into the GPU (not by executing in flash) But I haven't seen figures of the actual bandwidth yet, and no doubt to start with it will be expensive. The underlying technology of flash has much higher read latency than dram, so it's not really clear (to me, at least) if they can deliver the speeds needed to remove the need to cache in VRAM just by increasing parallelism.