Comment by Lerc

1 month ago

That makes sense because It's a relevance problem, not a reasoning problem. Adding the hint that it is a test implicitly says 'don't assume relevance'

It is reading

I want to X, the X'er is 50meters away, should I walk or drive?

It would be very unusual for someone to ask this in a context where X decides the outcome, because in that instance it the question would not normally arise.

By actually asking the question there is a weak signal that X is not relevant. Models are probably fine tuned more towards answering the question in the situation where one would normally ask. This question is really asking "do you realise that this is a condition where X influences the outcome?"

I suspect fine tuning models to detect subtext like this would easily catch this case but at the same time reduce favourability scores all over the place.