← Back to context

Comment by merlindru

1 day ago

Same. 4.7 felt like a definite regression

Interestingly enough, 4.7 actually did regress on a few benchmarks from 4.6, so it's more than just vibes.

  • It seems like a lot of things fed into that. Anthropic couldn't keep up with the compute costs when they got a huge influx of users. (So) effort level defaults got turned down. (Looks like we have direct effort control in the web interface now - thrilled about that!) Adaptive Thinking, while usually cheaper for them, seems less robust than Extended Thinking. And this part is just vibes, but the alignment on 4.7 feels too stiff. I understand wanting the model to push back more, but it seems like 4.7 will push back reflexively in situations where it's just odd.

    • Claude got very mad at me and burned more tokens than exist to complain about me asking about a "yellow background cell" in an excel spreadsheet.

      3 replies →

  • 4.7 is a different base model from 4.6, so it's possible that they introduced regressions with pre-training changes, or undercooked the post-training stage.

    • Just speculating but I "feel" 4.7 was post-trained using more synthetic techniques. The way it writes for one thing, it's "personality", is less human and more fatiguing-AI-slop like.

      1 reply →

4.7 was just them starting on the path on getting prices in line with the actual cost

Make it dumber. Charge more (by changing the tokenizer). Call it the latest and greatest. Reset expectations.