Comment by nsoonhui
4 days ago
I'm not entirely sure how this kind of study jives well with other study, such as "Reasoning models don't always say what they think" [0], discussion [1].
To quote the article:
We can’t be certain of either the “legibility” of the Chain-of-Thought (why, after all, should we expect that words in the English language are able to convey every single nuance of why a specific decision was made in a neural network?) or its “faithfulness”—the accuracy of its description. There’s no specific reason why the reported Chain-of-Thought must accurately reflect the true reasoning process; there might even be circumstances where a model actively hides aspects of its thought process from the user.
So if we can't trust the reasoning, then what's the point of checking whether they are "effective" or not?
No comments yet
Contribute on Hacker News ↗