Comment by CGamesPlay

4 days ago

> Writing style. In agent docs, using all caps might be an effective way to emphasize a particular instruction. In internal eng docs, this might come off rude or distracting.

To pile on to this, an agent needs to see "ABSOLUTELY NEVER do suchandsuch" to not do suchandsuch, but still has a pretty fair chance of doing it by accident. A talented human seeing "ABSOLUTELY NEVER do suchandsuch" will interpret this to mean there are consequences to doing suchandsuch, like being fired or causing production downtime. So the same message will be received differently by the different types of readers.

4 comments

CGamesPlay

skhameneh 4 days ago

Negative assertions can lead to unwanted weights in the context.

I've found positive assertions to be more predictable.

taikahessu 4 days ago

This. When doing Stable Diffusion, I have noticed this as well. Adding negatives can sometimes lead to the opposite results.
From what I can tell, if you say "no computers" for example (ie adding computer as negative), you are setting the scene for something like "where there should be computer, there is not".
I can't better describe this phenomenom, only that it can completely change the output in unexpected unwanted ways.
AB - B = AC
superfish 4 days ago
Do you mind sharing a specific concrete example? I'm curious.
- skhameneh 4 days ago
  
  I can, I don't have a specific example I've used to give you in this moment. And trying to share an exact example would read like a double negative.
  The general rule of thumb is only put what you want in context. If you put instructions of what not to do in context, those tokens can be misunderstood and create unintended/unwanted steering of the model.
  A fair example would be testing for positive sentiment. Consider weight of tokens appended to context, phrase instructions or questions to be neutral or positive.
  e.g. Some phrases and their impact:
  - "Is the tone of the user message positive?" will be biased for a false positive.
  - "Analyze the tone of the user message?" will be more neutral and less biased.
  - "Is the tone of the message negative?" will be biased for false positives when evaluating for negative tone.