Comment by fulafel
11 hours ago
> Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway.
Not the bottom end - most people are on laptops or mobile devices that are much lower GPU power than this.
Probably the bottom end an individual would want to consider using due to slow generation time.
Sure, you could theoretically take a model compressed in this manner and deploy it on an old netbook and run the calculations on the CPU, but each image would probably take an hour…
My laptop has a Pascal-era Nvidia GPU with 4GiB of VRAM. It's not very efficient but it can do these tasks a whole lot faster than the CPU, but the 4GiB limitation pretty much limits its use to only the tiniest models.
If this model can run inside of the 4GiB limit, that makes this infinitely more useful than existing models for me.
I was thinking more about the 0-3 year old midrange x86 laptops and phones, they have unified memory GPUs that are easily worth using (vs CPU), support narrow FP datatypes but don't have a ton of memory bandwidth.
Fair enough :)