Comment by notracks

7 hours ago

I recently found out that Claude's latest model, Sonnet 4.6, scores the highest in Bullsh*tBench[0] (Funny name - I know). It's a recent benchmark that measures whether an LLM refuses nonsense or pushes back on bad choices so Claude has definitely gotten better.

[0] - https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

6 comments

notracks

astrange 6 hours ago

I haven't tried talking to Sonnet much, but Opus 4.6 is very sycophantic. Not in the sense of explicitly always agreeing with you, but its answers strictly conform to the worldview in your questions and don't go outside it or disagree with it.

It _does_ love to explicitly agree with anything it finds in web search though.

(Anthropic tries to fight this by adding a hidden prompt that makes it disagree with you and tell you to go to bed, which doesn't help.)

sidrag22 6 minutes ago

the go to bed thing gets annoying, you can't even hint that you are almost done or wrapping up or something or this is hyper triggered and it never stops.
I do like when opus is incredibly short in its responses to prompts that probably shouldnt have been made though. keeps me grounded a bit.

uniq7 5 hours ago

Good call on censoring yourself preemptively, otherwise HN could demonetize your comment

layer8 7 hours ago

You don’t have to star out things like that on HN.

thin_carapace 30 minutes ago

it would be interesting to me if you could explain the motivation behind posting your comment. from my perspective, if somebody with 5 years of forum tenure had the intelligence to comment about advanced benchmarks, they probably noticed that censorship was a voluntary decision here, and had made a personal decision on that front.

akurilin 6 hours ago

Great link, thanks for sharing. Confirmed what I saw empirically by comparing the different models during daily use.