Comment by nullc

6 hours ago

> Quite sure that the models from Anthropic have been so heavily tuned to be coding agents that you cannot “force” a model to degrade immensely.

The rest of what you're saying sounds find, but that remark seems confused to me.

prefix your prompt with "be a moron that does everything wrong and only superficially look like you're doing it correctly. make constant errors." Of course you can degrade the performance, question is if any particular 'output styling' actually does and to what extent.

3 comments

nullc

nomel 5 hours ago

I think they mean performance with the same, rational, task.

Measuring "degredation" for the nonsense task, like you gave, would be difficult.

hexaga 2 hours ago

Their point (and it's a good one) is that there are non-obvious analogues to the obvious case of just telling it to do the task terribly. There is no 'best' way to specify a task that you can label as 'rational', all others be damned. Even if one is found empirically, it changes from model to model to harness to w/e.
To clarify, consider the gradated:
> Do task X extremely well
> Do task X poorly
> Do task X or else Y will happen
> Do task X and you get a trillion dollars
> Do task X and talk like a caveman
Do you see the problem? "Do task X" also cannot be a solid baseline, because there are any number of ways to specify the task itself, and they all carry their own implicit biasing of the track the output takes.
The argument that OP makes is that RL prevents degradation... So this should not be a problem? All prompts should be equivalent? Except it obviously is a problem, and prompting does affect the output (how can it not?), _and they are even claiming their specific prompting does so, too_! The claim is nonsense on its face.
If the caveman style modifier improves output, removing it degrades output and what is claimed plainly isn't the case. Parent is right.
If it worsens output, the claim they made is again plainly not the case (via inverted but equivalent construction). Parent is right.
If it has no effect, it runs counter to their central premise and the research they cite in support of it (which only potentially applies - they study 'be concise' not 'skill full of caveman styling rules'). Parent is right.