← Back to context

Comment by kevingadd

2 years ago

There's already a viewpoint encoded into the model during training (from its training set), the prompt is just another part of that. The prompt makes you upset because you can "see" the viewpoint encoded into it, but even if this prompt was gone there would still be a bunch of bias baked into the model.

Oh absolutely; the foundation model and the human preference tuning have a mix of intentional, unintentional, based-in-reality, and based-in-reddit-comment-reality bias; that's unavoidable. What's totally avoidable is making a world in which people are "debiased" based on hidden instructions.