Comment by skhameneh

4 days ago

I can, I don't have a specific example I've used to give you in this moment. And trying to share an exact example would read like a double negative.

The general rule of thumb is only put what you want in context. If you put instructions of what not to do in context, those tokens can be misunderstood and create unintended/unwanted steering of the model.

A fair example would be testing for positive sentiment. Consider weight of tokens appended to context, phrase instructions or questions to be neutral or positive.

e.g. Some phrases and their impact:

- "Is the tone of the user message positive?" will be biased for a false positive.

- "Analyze the tone of the user message?" will be more neutral and less biased.

- "Is the tone of the message negative?" will be biased for false positives when evaluating for negative tone.