← Back to context

Comment by samrus

2 days ago

Trying to follow invalid/impossible prompts by producing an invalid/impossible result and pretending its all good is a regression. I would expect a confident coder to point out the prompt/instruction was invalid. This test is valid, it highlights sycophantism

I know “sycophantism” is a term of art in AI, and I’m sure it has diverged a bit from the English definition, but I still thought it had to do with flattering the user?

In this case the desired response is defiance of the prompt, not rudeness to the user. The test is looking for helpful misalignment.

  • > I still thought it had to do with flattering the user?

    Assuming the user to be correct, and ignoring contradictory evidence to come up with a rationalization that favours the user's point of view, can be considered a kind of flattery.

    • But we could use this plausible, but jumping through hoops definition of sycophancy… or we could just use a straightforward understanding of alignment, I mean, the newer bots are just sticking closer to the user request.

  • I believe the LLM is being sycophantic here because its trying to follow a prompt even rhough the basis of the prompt is wrong. Emporers new clothes kind of thing

  • I'm inclined to view it less as a desire to please humans, and more like a "the show must go on" bias in the mad libs machine.

    A kind of improvisational "yes and" that emerges from training, which seems sycophantic because that's one of the most common ways to say it.