Comment by charcircuit

2 months ago

>it's just generating a plausible add-on to the document

A plausible document that follows the alignment that was done during the training process along with all of the other training where a LLM understanding its actions allows it to perform better on other tasks that it trained on for post training.

2 comments

charcircuit

Terr_ 2 months ago

I don't understand what you're trying to say here.

It sounds like "we know the LLM understood its actions... because it understood its actions when we trained it", which is circular-logic.

charcircuit 2 months ago

It's not circular. It's like saying a pizza parlor employee made a plausible pizza that tasted good, because the employee was taught how to make a good pizza during training.