Comment by furyofantares

5 months ago

Eh.

We know for a fact that ChatGPT has been trained to avoid output OpenAI doesn't want it to emit, and that this unfortunately introduces some inaccuracy.

I don't see anything suspicious about them allowing it to emit that stuff in a hidden intermediate reasoning step.

Yeah, it's true they don't what you to see what it's "thinking"! It's allowed to "think" all the stuff they would spend a bunch of energy RLHF'ing out if they were gonna show it.