Comment by PeterisP

2 years ago

These particular "guardrail responses" are there because they have been trained in from a relatively limited amount of very specific, manually curated examples telling "respond in this way" and providing this specific wording.

So I'd argue that those particular "override" responses (as opposed to majority of model answers which are emergent from large quantities of unannotated text) do represent the views of the creators, because they explicitly and intentionally chose to manufacture those particular training examples telling that this is an appropriate response to a particular type of query. This should not strain credulity - the demonstrated behavior totally doesn't look like a side-effect of some other restriction, all evidence points that Google explicitly included instructions for the model to refuse generating white-only images and the particular reasoning/justification to provide along with the refusal.

0 comments

PeterisP

No comments yet

Contribute on Hacker News ↗