For those who wants to know before installing
on my 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, it consumes 13 GB of RAM and takes 4 minutes to generate an image with 32 steps (~2 min for 16 steps).
On my Intel i5-4590T (8G RAM) it takes around 5-6 minutes to generate with 32 steps, swapping to disk as it does consume around 13G memory total. You don't get real-time feedback but it's very usable and fun to play with. I wish there was an option to force a manual seed though.
Working in WSL (Windows 10) Ubuntu on Ryzen 5600X; uses ~11GB of RAM and takes 2m04s with the default settings.
This is the first time I've played with a text-to-image model. I was aware that so-called "prompt engineering" can be tricky, but it's wild to see it for myself. A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.
> A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.
It shouldn't really, have you tried generating a few images with each prompt with and without space?
Even with the same prompt, you can get a wide variety of quality.
Ooh, I've got one of those! I've been getting by trying to run it on my PC, which for various reasons currently has a 5800 and a GTX 1050 4GB, which can just bearly handle optimizedSD at 90s/image but runs out of memory if I try to use the popular webui repo. Swapping to the 5600X might be worth it!
What OS does this need? Using Ubuntu 20.04, I'm getting stuck on openvino:
> $ pip install
> Could not find a version that satisfies the requirement openvino==2022.1.0 (from -r requirements.txt (line 6)) (from versions: 2021.4.0, 2021.4.1, 2021.4.2)
I even upgraded to python3.9, which, inexplicably, is required but not available in the "supported" OS.
EDIT: apparently it requires a version of pip that's newer than the one bundled with Ubuntu.
For me it runs at about 3.5 seconds per iteration per picture at 512x512.
There is also a fork that uses metal here and is much faster: https://github.com/magnusviri/stable-diffusion/tree/apple-si...
but it doesn't support seeding the rng and will occasionally produce completely black output. Useful if you want to spit out a whole bunch of images for one prompt but you lose the ability to re-run a specific seed with a tweaked prompt or increased iterations.
> For me it runs at about 3.5 seconds per iteration per picture at 512x512.
Wow that's impressively fast, I have a relatively recent Nvidia GPU that still takes 10 seconds. And the GPU is already almost as big as the entire macbook
Pretty easy to set up, though I had to take all the Homebrew stuff out of my environment before setting up the Conda environment (can also just export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1, at least in my case).
Otherwise, I followed the normal steps to set things up, and I'm now here generating 1 image every 30 seconds at default settings. This is on a M1 Max MacBook Pro at 64GB RAM.
I've been using this on my M1 Max and it works pretty well, 1.65 iterations per second (full precision, whereas my PC's 3080 can only do half-precision due to limited memory)... a 50-iteration image in about 40 seconds or so.
this worked fine for me, and running side by side with Intel CPU + nVidia 2070 it actually does not take much longer (and as a sibling said, seems to be working at full precision). It is one of the first things I've done that has properly made my M1 Max's fan spin up hard though!
Yeah you can. Using the mps backend, just set PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU for unimplementeded ops. Takes a minute but it's mostly GPU accelerated.
I got it working in about an hour on M1 ultra, mostly compiling things and having to tweak some model code to be compatible with metal. It works pretty well, about 1/10 to 1/20 of performance I can get on a 3080.
Great work!
Is there a similar project for (local) text generation (NLP) on a CPU + lots of RAM. I mean something transformers-based and of similar quality to GPT-3 (i.e. better than GPT-2). I understand that each prompt would take almost forever to complete but still curious if something like that exists
Yes. Fabrice Bellard wrote a highly optimised library (libnc) [1] for training and inference of neural networks on CPU (x86 with AVX-2), and implemented GPT-2 inference (gpt2tc) with it [2]. Later he added a CUDA backend to libnc. You can try it out at his website TextSynth [3] and I see it now runs various newer GPT-based models too, but it seems he hasn't released the code for that. Doesn't surprise me as he didn't release the code for libnc either, just the parts of gpt2tc excluding libnc (libnc is released as a free binary) so someone could reimplement GPT-J and the other models themselves.
Incidentally, he's currently leading the Large Text Compression Benchmark using a -based compressor called nncp [4] which is based on this work. It learns the transformer-based model as it goes, and the earlier versions didn't use a GPU.
I kinda understand why he would not release the source code. Perhaps, he's finally decided to monetize some of his coding skills. Maybe in the future, he'll start releasing some of those newer and bigger models to the public given that other big corps like FB have started already doing so (GPT-NeoX and OPT - as mentioned in the sibling comment by infinityio)
I've had success with GPT-J (6B) [0] and GPT-NeoX (20B) [1], but they probably aren't quite the quality level you'll want to have
On the other hand, Facebook has recently released the weights for a few sizes of their OPT model [2]. I haven't tried it, but that might be worth looking into, because they claim that their model is comparable to Davinci
Note that for CPU inference you will be unable to use float16 datatypes, otherwise it might error out
The best way I think is to try and run these models yourself. Depending on technical ability, you may want to run them on your own hardware, or use a service like dreamstudio.ai (which is run by the team behind Stable Diffusion afaik)
I am on a team working on alternative YouTube recommendations and I discovered Yannic Kilcher and Machine Learning Street Talk off the list of channel recommendations for Two Minute Papers. The whole list has a ton of AI channels:
On reddit I found some older GPUs take about 5 mins and here this video[1] says 5 mins for CPU using this OpenVino library. Not sure if OpenVino makes CPU chips compete with GPUs. Has anyone heard of OpenVino before ?
I'm curious about what makes this project special, see there's a lot of similar implementations of diffusion models based on pytorch/tf. Is it because it use the cpu itself to produce the diffusion process?
Yeah. For something like this, you ideally would want a powerful GPU with 12-24gb VRAM. If you have something like an RTX 2070 at the bare minimum, you probably don't need this and could do a lot more steps a lot faster on a GPU, but it's great for those who don't have that option.
I found a pretty good Docker container for it, though that's only really switching you from solving Python problems to Docker ones. Worth trying out if you have a Linux box or WSL installed though: https://github.com/AbdBarho/stable-diffusion-webui-docker
python 3.10 will fail on openvino.
i used these steps:
anaconda prompt
cd to the destination folder
conda create --name py38 python=3.8
conda activate py38
conda update --all
conda install openvino-ie4py -c intel
pip install -r requirements.txt
i also had to edit stable_diffusion.py and changed in the #decoder area:
changed vae.xml and vae.bin to vae_decoder.xml and vae_decoder.bin respectively
from there i could run
python stable_diffusion.py --prompt "Street-art painting of Emma Stone dancing, in picasso style"
for img2img, use this (note DIFFERENT program):
python demo.py --prompt "astronaut with jetpack floating in space with earth below" --init-image ./data/jomar.jpg --strength 0.5
Why do you keep posting the same comment under every SD post? It doesn't contribute to the discussion, and it's not very relevant to the OP. Some of the links don't even work anymore.
It's pretty expensive. They give you 100 free credits but I burned though that in about 10 minutes just trying to figure out how things worked. Didn't get any nice images.
After that, it's $1 per 100 credits, so about $6 an hour maybe.
LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images. While we downloaded and calculated CLIP embeddings of the pictures to compute similarity scores between pictures and texts, we subsequently discarded all the photos. Any researcher using the datasets must reconstruct the images data by downloading the subset they are interested in. For this purpose, we suggest the img2dataset tool.
I love the "*simply*", but doesn't it mean that (depending on country, laws etc., but generally):
1. The LAION group committed possible copyright infringements and even left undeniable evidence that they did - on top of their written testimony (dumping the "stolen goods into the river" does not make the infringement undone, does it?)
2. Any model trained on the "linked" data may commit copyright infringement.
3. As consequence, you using generated images may be liable.
I always wonder how it possibly is legal at all - considering that as a human artist if I was to copy material and remix it it without proper permission would be liable (again depending on situation), but suddenly ML is around the corner and it's all great and now you can keep remixing the potential problematic output further - no questions asked!?
I guess there are no precedence cases but why should an automaton/software (and its creators) be judged differently to persons? I don't want to spoil the fun but what am I missing?
Also disappointed that this dataset did not make sure to only collect unproblematic content like Creative Commons that
allows remixing. Would be a hell of a attribution list but definitely better than what is presented here.
EDIT: Formatting
EDIT2: I actually followed one of the projects mentioned not the linked repository. Clarified above.
If these AIs were actually just "remixing" and creating collages, then perhaps I would agree with you... but there is no exact pixel data stored here. This is fairly obvious when you consider that Stable Diffusion was trained on 100 terabytes of images yet the actual model file is 4gb.
Now I'm not saying that nothing created by these AIs should be considered copyright infringement. As a human artist, you are not judged on your process, you are judged on the end results. The same should be done for the works created by these AIs.
Bad cases make bad law - if you argue too hard in the direction of "any copyrighted material in the AI's training set makes it copyrighted" this could lead to, say, "Disney owns any animated movie made by someone who watched a Disney movie".
You can make an AI that doesn't memorize a specific training input; similarly you could probably make one that intentionally memorizes them. Both of these seem useful.
It's not simply a given that using copyright material to train a model is copyright violation.
In my view it isn't. No one image contributes a significant amount, and the process the machine is doing it analogous to that a human does when the human learns.
I'm all for having these models scrutinized for copyright violations (and possibly amending copyright laws), but this comment is nothing but low-effort FUD.
I don't know, but it's quite entertaining when the output occasionally has a corrupted, but recognizable, Getty Images watermark: https://imgur.com/SmibVME
(Prompt: "A horse delivering mail in New York City, 1870")
Legally, it is uncharted territory on many levels. I think there are good arguments to be made that these systems violate the intent behind copyright and trademarks, but not necessarily the laws.
For those who wants to know before installing on my 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, it consumes 13 GB of RAM and takes 4 minutes to generate an image with 32 steps (~2 min for 16 steps).
On my Intel i5-4590T (8G RAM) it takes around 5-6 minutes to generate with 32 steps, swapping to disk as it does consume around 13G memory total. You don't get real-time feedback but it's very usable and fun to play with. I wish there was an option to force a manual seed though.
FYI they just added the seed option to the repo today. Now waiting for img2img...
On Linux/i5-1135G7 - takes 3min very consistently for 32 steps. Memory use: ~13.5gb VIRT, 9.3gb RES.
Working in WSL (Windows 10) Ubuntu on Ryzen 5600X; uses ~11GB of RAM and takes 2m04s with the default settings.
This is the first time I've played with a text-to-image model. I was aware that so-called "prompt engineering" can be tricky, but it's wild to see it for myself. A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.
> A single space character can be the difference between garbage (or nightmare-fuel) output and output that captures the spirit of the prompt pretty well.
It shouldn't really, have you tried generating a few images with each prompt with and without space?
Even with the same prompt, you can get a wide variety of quality.
> Ryzen 5600X
Ooh, I've got one of those! I've been getting by trying to run it on my PC, which for various reasons currently has a 5800 and a GTX 1050 4GB, which can just bearly handle optimizedSD at 90s/image but runs out of memory if I try to use the popular webui repo. Swapping to the 5600X might be worth it!
That's very surprising and shouldn't be the case in general (the exception being things like compound words or spelling errors maybe).
Do you have some examples?
Are you fixing the random seed? If not the variation is more likely to be that than a single space.
What OS does this need? Using Ubuntu 20.04, I'm getting stuck on openvino:
> $ pip install > Could not find a version that satisfies the requirement openvino==2022.1.0 (from -r requirements.txt (line 6)) (from versions: 2021.4.0, 2021.4.1, 2021.4.2)
I even upgraded to python3.9, which, inexplicably, is required but not available in the "supported" OS.
EDIT: apparently it requires a version of pip that's newer than the one bundled with Ubuntu.
For anyone else who runs into this issue, run this: pip install --upgrade pip
Then run this again: pip install -r requirements.txt
also, make sure you're using python3.9 or lower
https://stackoverflow.com/a/70501550/21539
I've set up a Discord Bot that turns your text prompt into images using Stable Diffusion.
You can invite the bot to your server via https://discord.com/api/oauth2/authorize?client_id=101337304...
Talk to it using the /draw Slash Command.
It's very much a quick weekend hack, so no guarantees whatsoever. Not sure how long I can afford the AWS g4dn instance, so get it while it's hot.
PS: Anyone knows where to host reliable NVIDIA-equipped VMs at a reasonable price?
Oh and get your prompt ideas from https://lexica.art if you want good results.
then - how far away are we from having it on M1/M2 Macs, at least with regular processing? openvino may be one path I suppose: https://github.com/openvinotoolkit/openvino/issues/11554
I found this repo early on and have been using it to run inference on my M1 Pro MBP. https://github.com/ModeratePrawn/stable-diffusion-cpu
For me it runs at about 3.5 seconds per iteration per picture at 512x512.
There is also a fork that uses metal here and is much faster: https://github.com/magnusviri/stable-diffusion/tree/apple-si... but it doesn't support seeding the rng and will occasionally produce completely black output. Useful if you want to spit out a whole bunch of images for one prompt but you lose the ability to re-run a specific seed with a tweaked prompt or increased iterations.
> For me it runs at about 3.5 seconds per iteration per picture at 512x512.
Wow that's impressively fast, I have a relatively recent Nvidia GPU that still takes 10 seconds. And the GPU is already almost as big as the entire macbook
3 replies →
I'm using the fork here: https://github.com/magnusviri/stable-diffusion.git (apple-silicon-mps-support branch).
Pretty easy to set up, though I had to take all the Homebrew stuff out of my environment before setting up the Conda environment (can also just export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1, at least in my case).
Otherwise, I followed the normal steps to set things up, and I'm now here generating 1 image every 30 seconds at default settings. This is on a M1 Max MacBook Pro at 64GB RAM.
looks like there is an easier path using metal shaders: https://dev.to/craigmorten/setting-up-stable-diffusion-for-m...
and https://github.com/magnusviri/stable-diffusion/tree/apple-si...
I've been using this on my M1 Max and it works pretty well, 1.65 iterations per second (full precision, whereas my PC's 3080 can only do half-precision due to limited memory)... a 50-iteration image in about 40 seconds or so.
3 replies →
this worked fine for me, and running side by side with Intel CPU + nVidia 2070 it actually does not take much longer (and as a sibling said, seems to be working at full precision). It is one of the first things I've done that has properly made my M1 Max's fan spin up hard though!
PyTorch for m1 (https://pytorch.org/blog/introducing-accelerated-pytorch-tra... ) will not work: https://github.com/CompVis/stable-diffusion/issues/25 says "StableDiffusion is CPU-only on M1 Macs because not all the pytorch ops are implemented for Metal. Generating one image with 50 steps takes 4-5 minutes."
Yeah you can. Using the mps backend, just set PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU for unimplementeded ops. Takes a minute but it's mostly GPU accelerated.
By comparison I can generate 512x512 images every 15 seconds on an RTX 3080 (although there's an initial 30 second setup penalty for each run)
those guys are also working on it atm :-) https://github.com/lstein/stable-diffusion/pull/179
I got it working in about an hour on M1 ultra, mostly compiling things and having to tweak some model code to be compatible with metal. It works pretty well, about 1/10 to 1/20 of performance I can get on a 3080.
Great work! Is there a similar project for (local) text generation (NLP) on a CPU + lots of RAM. I mean something transformers-based and of similar quality to GPT-3 (i.e. better than GPT-2). I understand that each prompt would take almost forever to complete but still curious if something like that exists
Yes. Fabrice Bellard wrote a highly optimised library (libnc) [1] for training and inference of neural networks on CPU (x86 with AVX-2), and implemented GPT-2 inference (gpt2tc) with it [2]. Later he added a CUDA backend to libnc. You can try it out at his website TextSynth [3] and I see it now runs various newer GPT-based models too, but it seems he hasn't released the code for that. Doesn't surprise me as he didn't release the code for libnc either, just the parts of gpt2tc excluding libnc (libnc is released as a free binary) so someone could reimplement GPT-J and the other models themselves.
Incidentally, he's currently leading the Large Text Compression Benchmark using a -based compressor called nncp [4] which is based on this work. It learns the transformer-based model as it goes, and the earlier versions didn't use a GPU.
[1] https://bellard.org/libnc/
[2] https://bellard.org/libnc/gpt2tc.html
[3] https://textsynth.com/
[4] http://www.mattmahoney.net/dc/text.html#1085
Yet another gem by the genius Fabrice B!
I kinda understand why he would not release the source code. Perhaps, he's finally decided to monetize some of his coding skills. Maybe in the future, he'll start releasing some of those newer and bigger models to the public given that other big corps like FB have started already doing so (GPT-NeoX and OPT - as mentioned in the sibling comment by infinityio)
1 reply →
I've had success with GPT-J (6B) [0] and GPT-NeoX (20B) [1], but they probably aren't quite the quality level you'll want to have
On the other hand, Facebook has recently released the weights for a few sizes of their OPT model [2]. I haven't tried it, but that might be worth looking into, because they claim that their model is comparable to Davinci
Note that for CPU inference you will be unable to use float16 datatypes, otherwise it might error out
[0] https://huggingface.co/EleutherAI/gpt-j-6B [1] https://huggingface.co/EleutherAI/gpt-neox-20b [2] https://huggingface.co/facebook/opt-66b
What's the status of running SD on AMD GPUs?
https://rentry.org/tqizb explains how to install ROCm and then pytorch for ROCm
ROCm does not support APU, here is the list of supported GPU: https://docs.amd.com/bundle/Hardware_and_Software_Reference_...
Tried it, could not get it to run on my RX570. Read conflicting information whether that series is still supported or not.
rocm-smi detects the card and shows me the temps etc., but rocminfo throws an error.
I found this repo which should apparently fix it, didn't change my situation though: https://github.com/xuhuisheng/rocm-gfx803/
what does APU mean here?
2 replies →
Where can i get up to speed on what’s coming down there pipeline in this ai/ml image making scene?
(And learn the agreed upon terms)
It’s mostly all on discord. These are the two most active ones with devs on board https://discord.gg/QNxzjUfu https://discord.gg/nZ3hkXRV
Twitter too
2 replies →
The best way I think is to try and run these models yourself. Depending on technical ability, you may want to run them on your own hardware, or use a service like dreamstudio.ai (which is run by the team behind Stable Diffusion afaik)
Noone can tell.
Pandora's box has been opened.
Nothing is true, everything is permitted.
For a very high level view on what new technologies are coming, you can check out the Two Minute Papers youtube channel: https://www.youtube.com/c/K%C3%A1rolyZsolnai
I am on a team working on alternative YouTube recommendations and I discovered Yannic Kilcher and Machine Learning Street Talk off the list of channel recommendations for Two Minute Papers. The whole list has a ton of AI channels:
https://channelgalaxy.com/id%3DUCbfYPyITQ-7l4upoX8nvctg/
7' 12" on an ancient Intel Core i5-3350P CPU @ 3.10GHz (!) using BERT BasicTokenizer, default arguments
On reddit I found some older GPUs take about 5 mins and here this video[1] says 5 mins for CPU using this OpenVino library. Not sure if OpenVino makes CPU chips compete with GPUs. Has anyone heard of OpenVino before ?
1.https://youtu.be/5iXhhf7ILME
OpenVINO is developed by Intel themselves, and is one of many methods to freeze models to make CPU inference possible and performant.
https://en.wikipedia.org/wiki/OpenVINO
https://github.com/openvinotoolkit/openvino#supported-hardwa...
I'm curious about what makes this project special, see there's a lot of similar implementations of diffusion models based on pytorch/tf. Is it because it use the cpu itself to produce the diffusion process?
Yeah. For something like this, you ideally would want a powerful GPU with 12-24gb VRAM. If you have something like an RTX 2070 at the bare minimum, you probably don't need this and could do a lot more steps a lot faster on a GPU, but it's great for those who don't have that option.
A $500 RTX 3070 with 8GB of VRAM can generate 512x512 images with 50 steps in 7 seconds.
2 replies →
I didn't see any requirements on the page beyond a CPU on that list. Do you need a certain amount of RAM? Will more speed things up to a degree?
It used ~8GB of ram on my machine with similar generation time to the low vram fork of stable diffusion [1] running on my 4GB GTX1650.
[1] https://github.com/basujindal/stable-diffusion
Love this. OpenAI are livid. :^)
Why?
Because they no longer control the narrative.
9 replies →
The most powerful device I have is an ipad pro M1 16gb ram. Can I run this on that thing at all?
It is noticeably faster than the original model (~30-40%) on my machine.
openvino is an unsung hero.
can't get it to install requirements on Windows with Python 3.10 and MS Build Tools 2022. Any tips?
I found a pretty good Docker container for it, though that's only really switching you from solving Python problems to Docker ones. Worth trying out if you have a Linux box or WSL installed though: https://github.com/AbdBarho/stable-diffusion-webui-docker
python 3.10 will fail on openvino. i used these steps:
anaconda prompt
cd to the destination folder
conda create --name py38 python=3.8
conda activate py38
conda update --all
conda install openvino-ie4py -c intel
pip install -r requirements.txt
i also had to edit stable_diffusion.py and changed in the #decoder area: changed vae.xml and vae.bin to vae_decoder.xml and vae_decoder.bin respectively
from there i could run
python stable_diffusion.py --prompt "Street-art painting of Emma Stone dancing, in picasso style"
for img2img, use this (note DIFFERENT program):
python demo.py --prompt "astronaut with jetpack floating in space with earth below" --init-image ./data/jomar.jpg --strength 0.5
It needs python 3.9.
Anything for M1 GPU?
It works just fine.
web demo for stable diffusion (txt2img): https://huggingface.co/spaces/stabilityai/stable-diffusion
github with web ui: https://github.com/hlky/stable-diffusion
dev repo (more features, may have bugs): https://github.com/hlky/stable-diffusion-webui
repo with docker: https://github.com/AbdBarho/stable-diffusion-webui-docker
colab repo (new): https://github.com/altryne/sd-webui-colab
can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr
demo made with gradio: https://github.com/gradio-app/gradio
Why do you keep posting the same comment under every SD post? It doesn't contribute to the discussion, and it's not very relevant to the OP. Some of the links don't even work anymore.
Or if you don’t want to tweak anything, just use the hosted version? https://beta.dreamstudio.ai/
It's pretty expensive. They give you 100 free credits but I burned though that in about 10 minutes just trying to figure out how things worked. Didn't get any nice images.
After that, it's $1 per 100 credits, so about $6 an hour maybe.
2 replies →
Because I’ve got a 3080 already and don’t want to spend money?
1 reply →
This link appears to be dead:
> can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr
I assume it was the same as this one: https://colab.research.google.com/drive/1AfAmwLMd_Vx33O9IwY2...
https://laion.ai/faq/ Based on the FAQ of the dataset that was used for training of https://huggingface.co/spaces/stabilityai/stable-diffusion
I love the "*simply*", but doesn't it mean that (depending on country, laws etc., but generally):
1. The LAION group committed possible copyright infringements and even left undeniable evidence that they did - on top of their written testimony (dumping the "stolen goods into the river" does not make the infringement undone, does it?)
2. Any model trained on the "linked" data may commit copyright infringement.
3. As consequence, you using generated images may be liable.
I always wonder how it possibly is legal at all - considering that as a human artist if I was to copy material and remix it it without proper permission would be liable (again depending on situation), but suddenly ML is around the corner and it's all great and now you can keep remixing the potential problematic output further - no questions asked!?
I guess there are no precedence cases but why should an automaton/software (and its creators) be judged differently to persons? I don't want to spoil the fun but what am I missing?
Also disappointed that this dataset did not make sure to only collect unproblematic content like Creative Commons that allows remixing. Would be a hell of a attribution list but definitely better than what is presented here.
EDIT: Formatting
EDIT2: I actually followed one of the projects mentioned not the linked repository. Clarified above.
If these AIs were actually just "remixing" and creating collages, then perhaps I would agree with you... but there is no exact pixel data stored here. This is fairly obvious when you consider that Stable Diffusion was trained on 100 terabytes of images yet the actual model file is 4gb.
Now I'm not saying that nothing created by these AIs should be considered copyright infringement. As a human artist, you are not judged on your process, you are judged on the end results. The same should be done for the works created by these AIs.
Bad cases make bad law - if you argue too hard in the direction of "any copyrighted material in the AI's training set makes it copyrighted" this could lead to, say, "Disney owns any animated movie made by someone who watched a Disney movie".
You can make an AI that doesn't memorize a specific training input; similarly you could probably make one that intentionally memorizes them. Both of these seem useful.
It's not simply a given that using copyright material to train a model is copyright violation.
In my view it isn't. No one image contributes a significant amount, and the process the machine is doing it analogous to that a human does when the human learns.
It is likely legal, but is it ethical? If it is not ethical, should it be legal?
We do tend to treat humans differently based on them being sentient beings with a limited lifespan, not machines.
1 reply →
I'm all for having these models scrutinized for copyright violations (and possibly amending copyright laws), but this comment is nothing but low-effort FUD.
Is feeding copyrighted material into an AI really copyright infringement?
If it is, then every human brain is guilty of copyright infringement.
I don't know, but it's quite entertaining when the output occasionally has a corrupted, but recognizable, Getty Images watermark: https://imgur.com/SmibVME
(Prompt: "A horse delivering mail in New York City, 1870")
1 reply →
Is training the model infringement or is distributing the model infringement?
What if you trained the model and only distributed generated images?
Is a human making art "in the style of" also infringement?
Legally, it is uncharted territory on many levels. I think there are good arguments to be made that these systems violate the intent behind copyright and trademarks, but not necessarily the laws.