Does it really? I'd be surprised if abuse actually worked better than sternly worded warnings/instructions, and even if it did, it doesn't seem healthy to get used to that type of prompting.
its part of making sure the model actually engages in emotive communication, if i'm inventing insults i've never even thought about, i'm furious :)
saying i'm "furious" has lower entropy that incredibly implausible abuse. In some first party harnesses it just results in doom loops, but thats usually because the COT is hidden after the immediate turn in those setups. COT persistence helps with a lotta stuff
Does it really? I'd be surprised if abuse actually worked better than sternly worded warnings/instructions, and even if it did, it doesn't seem healthy to get used to that type of prompting.
its part of making sure the model actually engages in emotive communication, if i'm inventing insults i've never even thought about, i'm furious :)
saying i'm "furious" has lower entropy that incredibly implausible abuse. In some first party harnesses it just results in doom loops, but thats usually because the COT is hidden after the immediate turn in those setups. COT persistence helps with a lotta stuff