Run Stable Diffusion on Intel CPUs

4 years ago (github.com)

108 comments

amrrs

For those who wants to know before installing on my 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, it consumes 13 GB of RAM and takes 4 minutes to generate an image with 32 steps (~2 min for 16 steps).

digitallyfree 4 years ago
On my Intel i5-4590T (8G RAM) it takes around 5-6 minutes to generate with 32 steps, swapping to disk as it does consume around 13G memory total. You don't get real-time feedback but it's very usable and fun to play with. I wish there was an option to force a manual seed though.
- digitallyfree 4 years ago
  
  FYI they just added the seed option to the repo today. Now waiting for img2img...
yyyk 4 years ago

On Linux/i5-1135G7 - takes 3min very consistently for 32 steps. Memory use: ~13.5gb VIRT, 9.3gb RES.

moosedev 4 years ago

Working in WSL (Windows 10) Ubuntu on Ryzen 5600X; uses ~11GB of RAM and takes 2m04s with the default settings.

This is the first time I've played with a text-to-image model. I was aware that so-called "prompt engineering" can be tricky, but it's wild to see it for myself. A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.

teruakohatu 4 years ago

> A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.
It shouldn't really, have you tried generating a few images with each prompt with and without space?
Even with the same prompt, you can get a wide variety of quality.
LanternLight83 4 years ago

> Ryzen 5600X
Ooh, I've got one of those! I've been getting by trying to run it on my PC, which for various reasons currently has a 5800 and a GTX 1050 4GB, which can just bearly handle optimizedSD at 90s/image but runs out of memory if I try to use the popular webui repo. Swapping to the 5600X might be worth it!
nl 4 years ago

That's very surprising and shouldn't be the case in general (the exception being things like compound words or spelling errors maybe).
Do you have some examples?
Are you fixing the random seed? If not the variation is more likely to be that than a single space.

rhn_mk1 4 years ago

What OS does this need? Using Ubuntu 20.04, I'm getting stuck on openvino:

> $ pip install > Could not find a version that satisfies the requirement openvino==2022.1.0 (from -r requirements.txt (line 6)) (from versions: 2021.4.0, 2021.4.1, 2021.4.2)

I even upgraded to python3.9, which, inexplicably, is required but not available in the "supported" OS.

EDIT: apparently it requires a version of pip that's newer than the one bundled with Ubuntu.

nsnadell 4 years ago
For anyone else who runs into this issue, run this: pip install --upgrade pip
Then run this again: pip install -r requirements.txt
- ZainRiz 4 years ago
  
  also, make sure you're using python3.9 or lower
  https://stackoverflow.com/a/70501550/21539

ManuelKiessling 4 years ago

I've set up a Discord Bot that turns your text prompt into images using Stable Diffusion.

You can invite the bot to your server via https://discord.com/api/oauth2/authorize?client_id=101337304...

Talk to it using the /draw Slash Command.

It's very much a quick weekend hack, so no guarantees whatsoever. Not sure how long I can afford the AWS g4dn instance, so get it while it's hot.

PS: Anyone knows where to host reliable NVIDIA-equipped VMs at a reasonable price?

ManuelKiessling 4 years ago

Oh and get your prompt ideas from https://lexica.art if you want good results.

yayr 4 years ago

then - how far away are we from having it on M1/M2 Macs, at least with regular processing? openvino may be one path I suppose: https://github.com/openvinotoolkit/openvino/issues/11554

jwitthuhn 4 years ago
I found this repo early on and have been using it to run inference on my M1 Pro MBP. https://github.com/ModeratePrawn/stable-diffusion-cpu
For me it runs at about 3.5 seconds per iteration per picture at 512x512.
There is also a fork that uses metal here and is much faster: https://github.com/magnusviri/stable-diffusion/tree/apple-si... but it doesn't support seeding the rng and will occasionally produce completely black output. Useful if you want to spit out a whole bunch of images for one prompt but you lose the ability to re-run a specific seed with a tweaked prompt or increased iterations.
- woojoo666 4 years ago
  
  > For me it runs at about 3.5 seconds per iteration per picture at 512x512.
  Wow that's impressively fast, I have a relatively recent Nvidia GPU that still takes 10 seconds. And the GPU is already almost as big as the entire macbook
  
  3 replies →
chipx86 4 years ago

I'm using the fork here: https://github.com/magnusviri/stable-diffusion.git (apple-silicon-mps-support branch).
Pretty easy to set up, though I had to take all the Homebrew stuff out of my environment before setting up the Conda environment (can also just export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1, at least in my case).
Otherwise, I followed the normal steps to set things up, and I'm now here generating 1 image every 30 seconds at default settings. This is on a M1 Max MacBook Pro at 64GB RAM.
yayr 4 years ago
looks like there is an easier path using metal shaders: https://dev.to/craigmorten/setting-up-stable-diffusion-for-m...
and https://github.com/magnusviri/stable-diffusion/tree/apple-si...
- garblegarble 4 years ago
  
  I've been using this on my M1 Max and it works pretty well, 1.65 iterations per second (full precision, whereas my PC's 3080 can only do half-precision due to limited memory)... a 50-iteration image in about 40 seconds or so.
  
  3 replies →
- zmmmmm 4 years ago
  
  this worked fine for me, and running side by side with Intel CPU + nVidia 2070 it actually does not take much longer (and as a sibling said, seems to be working at full precision). It is one of the first things I've done that has properly made my M1 Max's fan spin up hard though!
homarp 4 years ago
PyTorch for m1 (https://pytorch.org/blog/introducing-accelerated-pytorch-tra... ) will not work: https://github.com/CompVis/stable-diffusion/issues/25 says "StableDiffusion is CPU-only on M1 Macs because not all the pytorch ops are implemented for Metal. Generating one image with 50 steps takes 4-5 minutes."
- fragmede 4 years ago
  
  Yeah you can. Using the mps backend, just set PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU for unimplementeded ops. Takes a minute but it's mostly GPU accelerated.
- andybak 4 years ago
  
  By comparison I can generate 512x512 images every 15 seconds on an RTX 3080 (although there's an initial 30 second setup penalty for each run)
yayr 4 years ago

those guys are also working on it atm :-) https://github.com/lstein/stable-diffusion/pull/179
pmalynin 4 years ago

I got it working in about an hour on M1 ultra, mostly compiling things and having to tweak some model code to be compatible with metal. It works pretty well, about 1/10 to 1/20 of performance I can get on a 3080.

krasi0 4 years ago

Great work! Is there a similar project for (local) text generation (NLP) on a CPU + lots of RAM. I mean something transformers-based and of similar quality to GPT-3 (i.e. better than GPT-2). I understand that each prompt would take almost forever to complete but still curious if something like that exists

versteegen 4 years ago
Yes. Fabrice Bellard wrote a highly optimised library (libnc) [1] for training and inference of neural networks on CPU (x86 with AVX-2), and implemented GPT-2 inference (gpt2tc) with it [2]. Later he added a CUDA backend to libnc. You can try it out at his website TextSynth [3] and I see it now runs various newer GPT-based models too, but it seems he hasn't released the code for that. Doesn't surprise me as he didn't release the code for libnc either, just the parts of gpt2tc excluding libnc (libnc is released as a free binary) so someone could reimplement GPT-J and the other models themselves.
Incidentally, he's currently leading the Large Text Compression Benchmark using a -based compressor called nncp [4] which is based on this work. It learns the transformer-based model as it goes, and the earlier versions didn't use a GPU.
[1] https://bellard.org/libnc/
[2] https://bellard.org/libnc/gpt2tc.html
[3] https://textsynth.com/
[4] http://www.mattmahoney.net/dc/text.html#1085
- krasi0 4 years ago
  
  Yet another gem by the genius Fabrice B!
  I kinda understand why he would not release the source code. Perhaps, he's finally decided to monetize some of his coding skills. Maybe in the future, he'll start releasing some of those newer and bigger models to the public given that other big corps like FB have started already doing so (GPT-NeoX and OPT - as mentioned in the sibling comment by infinityio)
  
  1 reply →
infinityio 4 years ago

I've had success with GPT-J (6B) [0] and GPT-NeoX (20B) [1], but they probably aren't quite the quality level you'll want to have
On the other hand, Facebook has recently released the weights for a few sizes of their OPT model [2]. I haven't tried it, but that might be worth looking into, because they claim that their model is comparable to Davinci
Note that for CPU inference you will be unable to use float16 datatypes, otherwise it might error out
[0] https://huggingface.co/EleutherAI/gpt-j-6B [1] https://huggingface.co/EleutherAI/gpt-neox-20b [2] https://huggingface.co/facebook/opt-66b

ByThyGrace 4 years ago

What's the status of running SD on AMD GPUs?

homarp 4 years ago
https://rentry.org/tqizb explains how to install ROCm and then pytorch for ROCm
ROCm does not support APU, here is the list of supported GPU: https://docs.amd.com/bundle/Hardware_and_Software_Reference_...
- panki27 4 years ago
  
  Tried it, could not get it to run on my RX570. Read conflicting information whether that series is still supported or not.
  rocm-smi detects the card and shows me the temps etc., but rocminfo throws an error.
  I found this repo which should apparently fix it, didn't change my situation though: https://github.com/xuhuisheng/rocm-gfx803/
- synergy20 4 years ago
  
  what does APU mean here?
  
  2 replies →

yieldcrv 4 years ago

Where can i get up to speed on what’s coming down there pipeline in this ai/ml image making scene?

(And learn the agreed upon terms)

desindol 4 years ago
It’s mostly all on discord. These are the two most active ones with devs on board https://discord.gg/QNxzjUfu https://discord.gg/nZ3hkXRV
- adultSwim 4 years ago
  
  Twitter too
  
  2 replies →
bogwog 4 years ago

The best way I think is to try and run these models yourself. Depending on technical ability, you may want to run them on your own hardware, or use a service like dreamstudio.ai (which is run by the team behind Stable Diffusion afaik)
aaaaaaaaaaab 4 years ago

Noone can tell.
Pandora's box has been opened.
Nothing is true, everything is permitted.
MattRix 4 years ago
For a very high level view on what new technologies are coming, you can check out the Two Minute Papers youtube channel: https://www.youtube.com/c/K%C3%A1rolyZsolnai
- NCC1701DEngage 4 years ago
  
  I am on a team working on alternative YouTube recommendations and I discovered Yannic Kilcher and Machine Learning Street Talk off the list of channel recommendations for Two Minute Papers. The whole list has a ton of AI channels:
  https://channelgalaxy.com/id%3DUCbfYPyITQ-7l4upoX8nvctg/

torotonnato 4 years ago

7' 12" on an ancient Intel Core i5-3350P CPU @ 3.10GHz (!) using BERT BasicTokenizer, default arguments

amrrs 4 years ago

On reddit I found some older GPUs take about 5 mins and here this video[1] says 5 mins for CPU using this OpenVino library. Not sure if OpenVino makes CPU chips compete with GPUs. Has anyone heard of OpenVino before ?

1.https://youtu.be/5iXhhf7ILME

minimaxir 4 years ago

OpenVINO is developed by Intel themselves, and is one of many methods to freeze models to make CPU inference possible and performant.
https://en.wikipedia.org/wiki/OpenVINO
T-A 4 years ago

https://github.com/openvinotoolkit/openvino#supported-hardwa...

Aiedail 4 years ago

I'm curious about what makes this project special, see there's a lot of similar implementations of diffusion models based on pytorch/tf. Is it because it use the cpu itself to produce the diffusion process?

nperez 4 years ago
Yeah. For something like this, you ideally would want a powerful GPU with 12-24gb VRAM. If you have something like an RTX 2070 at the bare minimum, you probably don't need this and could do a lot more steps a lot faster on a GPU, but it's great for those who don't have that option.
- Scaevolus 4 years ago
  
  A $500 RTX 3070 with 8GB of VRAM can generate 512x512 images with 50 steps in 7 seconds.
  
  2 replies →

mysterydip 4 years ago

I didn't see any requirements on the page beyond a CPU on that list. Do you need a certain amount of RAM? Will more speed things up to a degree?

neurostimulant 4 years ago

It used ~8GB of ram on my machine with similar generation time to the low vram fork of stable diffusion [1] running on my 4GB GTX1650.
[1] https://github.com/basujindal/stable-diffusion

aaaaaaaaaaab 4 years ago

Love this. OpenAI are livid. :^)

enchiridion 4 years ago
Why?
- aaaaaaaaaaab 4 years ago
  
  Because they no longer control the narrative.
  
  9 replies →

boppo1 4 years ago

The most powerful device I have is an ipad pro M1 16gb ram. Can I run this on that thing at all?

hustwindmaple1 4 years ago

It is noticeably faster than the original model (~30-40%) on my machine.

motoboi 4 years ago

openvino is an unsung hero.

polskibus 4 years ago

can't get it to install requirements on Windows with Python 3.10 and MS Build Tools 2022. Any tips?

smoldesu 4 years ago

I found a pretty good Docker container for it, though that's only really switching you from solving Python problems to Docker ones. Worth trying out if you have a Linux box or WSL installed though: https://github.com/AbdBarho/stable-diffusion-webui-docker
manyone1 4 years ago

python 3.10 will fail on openvino. i used these steps:
anaconda prompt
cd to the destination folder
conda create --name py38 python=3.8
conda activate py38
conda update --all
conda install openvino-ie4py -c intel
pip install -r requirements.txt
i also had to edit stable_diffusion.py and changed in the #decoder area: changed vae.xml and vae.bin to vae_decoder.xml and vae_decoder.bin respectively
from there i could run
python stable_diffusion.py --prompt "Street-art painting of Emma Stone dancing, in picasso style"
for img2img, use this (note DIFFERENT program):
python demo.py --prompt "astronaut with jetpack floating in space with earth below" --init-image ./data/jomar.jpg --strength 0.5
desindol 4 years ago

It needs python 3.9.

keepquestioning 4 years ago

Anything for M1 GPU?

hustwindmaple1 4 years ago

It works just fine.

avocado2 4 years ago

web demo for stable diffusion (txt2img): https://huggingface.co/spaces/stabilityai/stable-diffusion

github with web ui: https://github.com/hlky/stable-diffusion

dev repo (more features, may have bugs): https://github.com/hlky/stable-diffusion-webui

repo with docker: https://github.com/AbdBarho/stable-diffusion-webui-docker

colab repo (new): https://github.com/altryne/sd-webui-colab

can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr

demo made with gradio: https://github.com/gradio-app/gradio

MitPitt 4 years ago

Why do you keep posting the same comment under every SD post? It doesn't contribute to the discussion, and it's not very relevant to the OP. Some of the links don't even work anymore.
skybrian 4 years ago
Or if you don’t want to tweak anything, just use the hosted version? https://beta.dreamstudio.ai/
- esperent 4 years ago
  
  It's pretty expensive. They give you 100 free credits but I burned though that in about 10 minutes just trying to figure out how things worked. Didn't get any nice images.
  After that, it's $1 per 100 credits, so about $6 an hour maybe.
  
  2 replies →
- TylerE 4 years ago
  
  Because I’ve got a 3080 already and don’t want to spend money?
  
  1 reply →
nestorD 4 years ago
This link appears to be dead:
> can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr
- yreg 4 years ago
  
  I assume it was the same as this one: https://colab.research.google.com/drive/1AfAmwLMd_Vx33O9IwY2...

stateoff 4 years ago

https://laion.ai/faq/ Based on the FAQ of the dataset that was used for training of https://huggingface.co/spaces/stabilityai/stable-diffusion

   LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images. While we downloaded and calculated CLIP embeddings of the pictures to compute similarity scores between pictures and texts, we subsequently discarded all the photos. Any researcher using the datasets must reconstruct the images data by downloading the subset they are interested in. For this purpose, we suggest the img2dataset tool.

I love the "*simply*", but doesn't it mean that (depending on country, laws etc., but generally):

1. The LAION group committed possible copyright infringements and even left undeniable evidence that they did - on top of their written testimony (dumping the "stolen goods into the river" does not make the infringement undone, does it?)

2. Any model trained on the "linked" data may commit copyright infringement.

3. As consequence, you using generated images may be liable.

I always wonder how it possibly is legal at all - considering that as a human artist if I was to copy material and remix it it without proper permission would be liable (again depending on situation), but suddenly ML is around the corner and it's all great and now you can keep remixing the potential problematic output further - no questions asked!?

I guess there are no precedence cases but why should an automaton/software (and its creators) be judged differently to persons? I don't want to spoil the fun but what am I missing?

Also disappointed that this dataset did not make sure to only collect unproblematic content like Creative Commons that allows remixing. Would be a hell of a attribution list but definitely better than what is presented here.

EDIT: Formatting

EDIT2: I actually followed one of the projects mentioned not the linked repository. Clarified above.

MattRix 4 years ago

If these AIs were actually just "remixing" and creating collages, then perhaps I would agree with you... but there is no exact pixel data stored here. This is fairly obvious when you consider that Stable Diffusion was trained on 100 terabytes of images yet the actual model file is 4gb.
Now I'm not saying that nothing created by these AIs should be considered copyright infringement. As a human artist, you are not judged on your process, you are judged on the end results. The same should be done for the works created by these AIs.
astrange 4 years ago

Bad cases make bad law - if you argue too hard in the direction of "any copyrighted material in the AI's training set makes it copyrighted" this could lead to, say, "Disney owns any animated movie made by someone who watched a Disney movie".
You can make an AI that doesn't memorize a specific training input; similarly you could probably make one that intentionally memorizes them. Both of these seem useful.
nl 4 years ago
It's not simply a given that using copyright material to train a model is copyright violation.
In my view it isn't. No one image contributes a significant amount, and the process the machine is doing it analogous to that a human does when the human learns.
- incrudible 4 years ago
  
  It is likely legal, but is it ethical? If it is not ethical, should it be legal?
  We do tend to treat humans differently based on them being sentient beings with a limited lifespan, not machines.
  
  1 reply →
bogwog 4 years ago

I'm all for having these models scrutinized for copyright violations (and possibly amending copyright laws), but this comment is nothing but low-effort FUD.
pdntspa 4 years ago
Is feeding copyrighted material into an AI really copyright infringement?
- cbozeman 4 years ago
  
  If it is, then every human brain is guilty of copyright infringement.
- moosedev 4 years ago
  
  I don't know, but it's quite entertaining when the output occasionally has a corrupted, but recognizable, Getty Images watermark: https://imgur.com/SmibVME
  (Prompt: "A horse delivering mail in New York City, 1870")
  
  1 reply →
postalrat 4 years ago
Is training the model infringement or is distributing the model infringement?
What if you trained the model and only distributed generated images?
Is a human making art "in the style of" also infringement?
- incrudible 4 years ago
  
  Legally, it is uncharted territory on many levels. I think there are good arguments to be made that these systems violate the intent behind copyright and trademarks, but not necessarily the laws.