Comment by rohtashotas
1 year ago
It's not a silly mistake. It was rlhf'd to do this intentionally.
When the results are more extremist than the unfiltered model, it's no longer a 'small mistake'
1 year ago
It's not a silly mistake. It was rlhf'd to do this intentionally.
When the results are more extremist than the unfiltered model, it's no longer a 'small mistake'
rlhf: Reinforcement learning from human feedback
How is this pronounced out loud?
I was just saving folks a google, as I had no idea what the acronym was.
I propose rill-hiff until someone who actually know what they’re doing shows up!
Realistically it was probably just how Gemini was prompted to use the image generator tool