Comment by energy123

3 months ago

Aligning subjective values (which sit off the false vs truth spectrum) is quite different to aligning it towards incorrect facts.

2 comments

energy123

How can a model judge what's correct vs. incorrect? Or do you just mean the narratives that are more common in the data set?

energy123 3 months ago

I mean forcing the model to repeat things that we as humans know are factually false. For example forcing it to say the sky is green or 1+1=3. That's qualitatively different to forcing it to hold a subjective morality which is neither true or false. Human morality doesn't even sit on that spectrum.