Run Stable Diffusion on Intel CPUs

4 years ago (github.com)

For those who wants to know before installing on my 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, it consumes 13 GB of RAM and takes 4 minutes to generate an image with 32 steps (~2 min for 16 steps).

  • On my Intel i5-4590T (8G RAM) it takes around 5-6 minutes to generate with 32 steps, swapping to disk as it does consume around 13G memory total. You don't get real-time feedback but it's very usable and fun to play with. I wish there was an option to force a manual seed though.

  • On Linux/i5-1135G7 - takes 3min very consistently for 32 steps. Memory use: ~13.5gb VIRT, 9.3gb RES.

Working in WSL (Windows 10) Ubuntu on Ryzen 5600X; uses ~11GB of RAM and takes 2m04s with the default settings.

This is the first time I've played with a text-to-image model. I was aware that so-called "prompt engineering" can be tricky, but it's wild to see it for myself. A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.

  • > A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.

    It shouldn't really, have you tried generating a few images with each prompt with and without space?

    Even with the same prompt, you can get a wide variety of quality.

  • > Ryzen 5600X

    Ooh, I've got one of those! I've been getting by trying to run it on my PC, which for various reasons currently has a 5800 and a GTX 1050 4GB, which can just bearly handle optimizedSD at 90s/image but runs out of memory if I try to use the popular webui repo. Swapping to the 5600X might be worth it!

  • That's very surprising and shouldn't be the case in general (the exception being things like compound words or spelling errors maybe).

    Do you have some examples?

    Are you fixing the random seed? If not the variation is more likely to be that than a single space.

What OS does this need? Using Ubuntu 20.04, I'm getting stuck on openvino:

> $ pip install > Could not find a version that satisfies the requirement openvino==2022.1.0 (from -r requirements.txt (line 6)) (from versions: 2021.4.0, 2021.4.1, 2021.4.2)

I even upgraded to python3.9, which, inexplicably, is required but not available in the "supported" OS.

EDIT: apparently it requires a version of pip that's newer than the one bundled with Ubuntu.

I've set up a Discord Bot that turns your text prompt into images using Stable Diffusion.

You can invite the bot to your server via https://discord.com/api/oauth2/authorize?client_id=101337304...

Talk to it using the /draw Slash Command.

It's very much a quick weekend hack, so no guarantees whatsoever. Not sure how long I can afford the AWS g4dn instance, so get it while it's hot.

PS: Anyone knows where to host reliable NVIDIA-equipped VMs at a reasonable price?

then - how far away are we from having it on M1/M2 Macs, at least with regular processing? openvino may be one path I suppose: https://github.com/openvinotoolkit/openvino/issues/11554

Great work! Is there a similar project for (local) text generation (NLP) on a CPU + lots of RAM. I mean something transformers-based and of similar quality to GPT-3 (i.e. better than GPT-2). I understand that each prompt would take almost forever to complete but still curious if something like that exists

  • Yes. Fabrice Bellard wrote a highly optimised library (libnc) [1] for training and inference of neural networks on CPU (x86 with AVX-2), and implemented GPT-2 inference (gpt2tc) with it [2]. Later he added a CUDA backend to libnc. You can try it out at his website TextSynth [3] and I see it now runs various newer GPT-based models too, but it seems he hasn't released the code for that. Doesn't surprise me as he didn't release the code for libnc either, just the parts of gpt2tc excluding libnc (libnc is released as a free binary) so someone could reimplement GPT-J and the other models themselves.

    Incidentally, he's currently leading the Large Text Compression Benchmark using a -based compressor called nncp [4] which is based on this work. It learns the transformer-based model as it goes, and the earlier versions didn't use a GPU.

    [1] https://bellard.org/libnc/

    [2] https://bellard.org/libnc/gpt2tc.html

    [3] https://textsynth.com/

    [4] http://www.mattmahoney.net/dc/text.html#1085

    • Yet another gem by the genius Fabrice B!

      I kinda understand why he would not release the source code. Perhaps, he's finally decided to monetize some of his coding skills. Maybe in the future, he'll start releasing some of those newer and bigger models to the public given that other big corps like FB have started already doing so (GPT-NeoX and OPT - as mentioned in the sibling comment by infinityio)

      1 reply →

  • I've had success with GPT-J (6B) [0] and GPT-NeoX (20B) [1], but they probably aren't quite the quality level you'll want to have

    On the other hand, Facebook has recently released the weights for a few sizes of their OPT model [2]. I haven't tried it, but that might be worth looking into, because they claim that their model is comparable to Davinci

    Note that for CPU inference you will be unable to use float16 datatypes, otherwise it might error out

    [0] https://huggingface.co/EleutherAI/gpt-j-6B [1] https://huggingface.co/EleutherAI/gpt-neox-20b [2] https://huggingface.co/facebook/opt-66b

What's the status of running SD on AMD GPUs?

Where can i get up to speed on what’s coming down there pipeline in this ai/ml image making scene?

(And learn the agreed upon terms)

7' 12" on an ancient Intel Core i5-3350P CPU @ 3.10GHz (!) using BERT BasicTokenizer, default arguments

On reddit I found some older GPUs take about 5 mins and here this video[1] says 5 mins for CPU using this OpenVino library. Not sure if OpenVino makes CPU chips compete with GPUs. Has anyone heard of OpenVino before ?

1.https://youtu.be/5iXhhf7ILME

I'm curious about what makes this project special, see there's a lot of similar implementations of diffusion models based on pytorch/tf. Is it because it use the cpu itself to produce the diffusion process?

  • Yeah. For something like this, you ideally would want a powerful GPU with 12-24gb VRAM. If you have something like an RTX 2070 at the bare minimum, you probably don't need this and could do a lot more steps a lot faster on a GPU, but it's great for those who don't have that option.

The most powerful device I have is an ipad pro M1 16gb ram. Can I run this on that thing at all?

can't get it to install requirements on Windows with Python 3.10 and MS Build Tools 2022. Any tips?

  • python 3.10 will fail on openvino. i used these steps:

    anaconda prompt

    cd to the destination folder

    conda create --name py38 python=3.8

    conda activate py38

    conda update --all

    conda install openvino-ie4py -c intel

    pip install -r requirements.txt

    i also had to edit stable_diffusion.py and changed in the #decoder area: changed vae.xml and vae.bin to vae_decoder.xml and vae_decoder.bin respectively

    from there i could run

    python stable_diffusion.py --prompt "Street-art painting of Emma Stone dancing, in picasso style"

    for img2img, use this (note DIFFERENT program):

    python demo.py --prompt "astronaut with jetpack floating in space with earth below" --init-image ./data/jomar.jpg --strength 0.5

web demo for stable diffusion (txt2img): https://huggingface.co/spaces/stabilityai/stable-diffusion

github with web ui: https://github.com/hlky/stable-diffusion

dev repo (more features, may have bugs): https://github.com/hlky/stable-diffusion-webui

repo with docker: https://github.com/AbdBarho/stable-diffusion-webui-docker

colab repo (new): https://github.com/altryne/sd-webui-colab

can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr

demo made with gradio: https://github.com/gradio-app/gradio

https://laion.ai/faq/ Based on the FAQ of the dataset that was used for training of https://huggingface.co/spaces/stabilityai/stable-diffusion

   LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images. While we downloaded and calculated CLIP embeddings of the pictures to compute similarity scores between pictures and texts, we subsequently discarded all the photos. Any researcher using the datasets must reconstruct the images data by downloading the subset they are interested in. For this purpose, we suggest the img2dataset tool.

I love the "*simply*", but doesn't it mean that (depending on country, laws etc., but generally):

1. The LAION group committed possible copyright infringements and even left undeniable evidence that they did - on top of their written testimony (dumping the "stolen goods into the river" does not make the infringement undone, does it?)

2. Any model trained on the "linked" data may commit copyright infringement.

3. As consequence, you using generated images may be liable.

I always wonder how it possibly is legal at all - considering that as a human artist if I was to copy material and remix it it without proper permission would be liable (again depending on situation), but suddenly ML is around the corner and it's all great and now you can keep remixing the potential problematic output further - no questions asked!?

I guess there are no precedence cases but why should an automaton/software (and its creators) be judged differently to persons? I don't want to spoil the fun but what am I missing?

Also disappointed that this dataset did not make sure to only collect unproblematic content like Creative Commons that allows remixing. Would be a hell of a attribution list but definitely better than what is presented here.

EDIT: Formatting

EDIT2: I actually followed one of the projects mentioned not the linked repository. Clarified above.

  • If these AIs were actually just "remixing" and creating collages, then perhaps I would agree with you... but there is no exact pixel data stored here. This is fairly obvious when you consider that Stable Diffusion was trained on 100 terabytes of images yet the actual model file is 4gb.

    Now I'm not saying that nothing created by these AIs should be considered copyright infringement. As a human artist, you are not judged on your process, you are judged on the end results. The same should be done for the works created by these AIs.

  • Bad cases make bad law - if you argue too hard in the direction of "any copyrighted material in the AI's training set makes it copyrighted" this could lead to, say, "Disney owns any animated movie made by someone who watched a Disney movie".

    You can make an AI that doesn't memorize a specific training input; similarly you could probably make one that intentionally memorizes them. Both of these seem useful.

  • It's not simply a given that using copyright material to train a model is copyright violation.

    In my view it isn't. No one image contributes a significant amount, and the process the machine is doing it analogous to that a human does when the human learns.

    • It is likely legal, but is it ethical? If it is not ethical, should it be legal?

      We do tend to treat humans differently based on them being sentient beings with a limited lifespan, not machines.

      1 reply →

  • I'm all for having these models scrutinized for copyright violations (and possibly amending copyright laws), but this comment is nothing but low-effort FUD.

  • Is training the model infringement or is distributing the model infringement?

    What if you trained the model and only distributed generated images?

    Is a human making art "in the style of" also infringement?

    • Legally, it is uncharted territory on many levels. I think there are good arguments to be made that these systems violate the intent behind copyright and trademarks, but not necessarily the laws.