Comment by gordonhart
13 days ago
Anecdotally, 4o's sycophancy was higher than any other model I've used. It was aggressively "chat-tuned" to say what it thought the user wanted to hear. The latest crop of frontier models from OpenAI and others seems to have significantly improved on this front — does anybody know of a sycophancy benchmark attempting to quantify this?
If I worked at OpenAI, I would dial up the sycophancy to lock my users in right before raising subscription prices.
That's... a strategy. Matter of time before an AI companion company succeeds with this by finetuning one of the open-source offerings. Cynically I'm sure there are at least a few VC backed startups already trying this
Cynically I think Anthropic is on the bleeding edge of this sort of fine-tuned manipulation.
Also If I worked for one of these firms I would ensure that executives and people with elevated status receive higher quality/more expensive inference than the peons. Impress the bosses to keep the big contracts rolling in, and then cheap out on the day-to-day.