Comment by jamesblonde

2 months ago

The reference in the text to Anthropic’s “Towards Understanding Sycophancy in Language Models” is related to RLHF (reinforcement learning with human feedback).

Claude code uses primarily different "pathways" in Anthropic LLMs that were not post-trained with RLHF, but rather with RLVF (reinforcement learning with verifiable rewards).

So, his point about code being produced to please the user isn't valid from where I am sitting.

1 comment

jamesblonde

KatanaLarp 2 months ago

[dead]