← Back to context

Comment by gordonhart

13 days ago

Anecdotally, 4o's sycophancy was higher than any other model I've used. It was aggressively "chat-tuned" to say what it thought the user wanted to hear. The latest crop of frontier models from OpenAI and others seems to have significantly improved on this front — does anybody know of a sycophancy benchmark attempting to quantify this?

If I worked at OpenAI, I would dial up the sycophancy to lock my users in right before raising subscription prices.

  • That's... a strategy. Matter of time before an AI companion company succeeds with this by finetuning one of the open-source offerings. Cynically I'm sure there are at least a few VC backed startups already trying this

    • Cynically I think Anthropic is on the bleeding edge of this sort of fine-tuned manipulation.

      Also If I worked for one of these firms I would ensure that executives and people with elevated status receive higher quality/more expensive inference than the peons. Impress the bosses to keep the big contracts rolling in, and then cheap out on the day-to-day.