← Back to context

Comment by nsoonhui

4 days ago

I'm not entirely sure how this kind of study jives well with other study, such as "Reasoning models don't always say what they think" [0], discussion [1].

To quote the article:

  We can’t be certain of either the “legibility” of the Chain-of-Thought (why, after all, should we expect that words in the English language are able to convey every single nuance of why a specific decision was made in a neural network?) or its “faithfulness”—the accuracy of its description. There’s no specific reason why the reported Chain-of-Thought must accurately reflect the true reasoning process; there might even be circumstances where a model actively hides aspects of its thought process from the user.

So if we can't trust the reasoning, then what's the point of checking whether they are "effective" or not?

[0]: https://news.ycombinator.com/item?id=43572374