Comment by Gracana

1 month ago

I thought paging was so inefficient that it wasn't worth doing vs using CPU inference for the parts of the model that are in system memory. Maybe if you have a good GPU and a turtle of a CPU, but still somehow have the memory bandwidth to make shuffling data in and out of the GPU worthwhile? I'm curious to know who is doing this and why.

0 comments