Comment by gordonhart

4 months ago

Anecdotally, 4o's sycophancy was higher than any other model I've used. It was aggressively "chat-tuned" to say what it thought the user wanted to hear. The latest crop of frontier models from OpenAI and others seems to have significantly improved on this front — does anybody know of a sycophancy benchmark attempting to quantify this?

3 comments

gordonhart

co_king_3 4 months ago

If I worked at OpenAI, I would dial up the sycophancy to lock my users in right before raising subscription prices.

gordonhart 4 months ago
That's... a strategy. Matter of time before an AI companion company succeeds with this by finetuning one of the open-source offerings. Cynically I'm sure there are at least a few VC backed startups already trying this
- co_king_3 4 months ago
  
  Cynically I think Anthropic is on the bleeding edge of this sort of fine-tuned manipulation.
  Also If I worked for one of these firms I would ensure that executives and people with elevated status receive higher quality/more expensive inference than the peons. Impress the bosses to keep the big contracts rolling in, and then cheap out on the day-to-day.