← Back to context

Comment by charcircuit

10 hours ago

>it's just generating a plausible add-on to the document

A plausible document that follows the alignment that was done during the training process along with all of the other training where a LLM understanding its actions allows it to perform better on other tasks that it trained on for post training.

I don't understand what you're trying to say here.

It sounds like "we know the LLM understood its actions... because it understood its actions when we trained it", which is circular-logic.