← Back to context

Comment by kromem

2 years ago

For about a year now I've privately wondered if GPT-4 would end up modeling/simulating the over-justification effect.

Very much appreciate the link showing it absolutely did.

Also why I structure my system prompts to say it "loves doing X" or other intrinsic alignments and not using extrinsic motivators like tipping.

Yet again, it seems there's value in anthropomorphic considerations of a NN trained on anthropomorphic data.