Comment by int_19h

9 months ago

If you just ask the question straight up, it does that. But with a sufficiently forceful prompt, you can force it to think about how it should respond first, and then the CoT leaks the answer (it will still refuse in the "final response" part though).

2 comments

int_19h

deadbabe 9 months ago

Imagine reaching a point where we have to prompt LLMs with the answers to the questions we want it to answer.

int_19h 9 months ago

To clarify, by "forceful" here I mean a prompt that says something like "think carefully about whether and how to answer this question first before giving your final answer", but otherwise not leading it to the answers. What you need to force is CoT specifically, it will do the rest.