Comment by dartos
1 year ago
I think the root problem is assuming that these generated images are representations of anything.
Nobody should.
They’re literally semi-random graphic artifacts that we humans give 100% of the meaning to.
1 year ago
I think the root problem is assuming that these generated images are representations of anything.
Nobody should.
They’re literally semi-random graphic artifacts that we humans give 100% of the meaning to.
So you're saying whatever the model doesn't have to be tethered to reality at all? I wonder if you think the same for chatgpt. Do you think it should just make up whatever it wants when asked a question like "why does it rain?". After all, you can say the words generated are also semi-random sequence of letters that humans give meaning too.
I think going to a statistics based generator with the intention to take what you see as an accurate representation of reality is a non starter.
The model isn’t trying to replicate reality, it’s trying to minimize some error metric.
Sure it may be inspired by reality, but should never be considered an authority on reality.
And yes, the words an LLM write have no meaning. We assign meaning to the output. There was no intention behind them.
The fact that some models can perfectly recall _some_ information that appears frequently in the training data is a happy accident. Remember, transformers were initially designed for translation tasks.
> Do you think it should just make up whatever it wants when asked a question like "why does it rain?"
Always doing that would be preferable to the status quo, where it does it just often enough to do damage while retaining a veneer of credibility.
> They’re literally semi-random graphic artifacts that we humans give 100% of the meaning to.
They're graphic artifacts generated semi-randomly from a training set of human-created material.
That's not quite the same thing, as otherwise the "adjustment" here wouldn't have been considered by Google in the first place.
The fact that the training data is human curated arguably further removes the generations from representing reality (as we see here with this whole little controversy)
I think, with respect to the point I was making, they are the same thing.
But then if it simply reflected reality there also be no problem, right, because it’s a synthetically generated output. Like if instead of people it output animals, or it took representative data from actual sources to the question. In either case it should be “ok” because it’s generated? They might as well output planet of the apes or starship trooper bugs…
With emphasis on the "semi-". They are very good at following prompts, and so overplaying the "random" part is dishonest. When you ask it for something, and it follows your instructions except for injecting a bunch of biases for the things you haven't specified, it matters what those biases are.
Are they good at following prompts?
Unless I format my prompts very specifically, diffusion models are not good at following them. Even then I need to constantly tweak my prompts and negative prompts to zero in on what I want.
That process is novel and pretty fun, but it doesn’t imply the model is good at following my prompt.
LLMs are similar. Initially they seem good at following a prompt, but continue the conversation and they start showing recall issues, knowledge gaps, improper formatting, etc.
It’s not dishonest to say semi-random. It’s accurate. The detokenizing step of inference, for example, is taking a sample from a probability distribution which the model generates. Literally stochastic.