Comment by KoolKat23

1 year ago

I'm noticing a strange common theme in all these riddles, it's being asked and getting wrong.

They're all badly worded questions. The model knows something is up and reads into it too much. In this case it's tautology, you would usually say "a mother and her son...".

I think it may answer correctly if you start off asking "Please solve the below riddle:"

There was another example yesterday which it solved correctly after this addition.(In that case the point of views were all mixed up, it only worked as a riddle).

> They're all badly worded questions. The model knows something is up and reads into it too much. The model knows something is up and reads into it too much. In this case it's tautology, you would usually say "a mother and her son...".

How is "a woman and her son" badly worded? The meaning is clear and blatently obvious to any English speaker.

  • Go read the whole riddle, add the rest of it and you'll see it's contrived, hence it's a riddle even for humans. The model in it's thinking (which you can read) places undue influence on certain anomalous factors. In practice, a person would say this way more eloquently than the riddle.

Yup. The models fail on gotcha questions asked without warning, especially when evaluated on the first snap answer. Much like approximately all humans.

  • > especially when evaluated on the first snap answer

    The whole point of o1 is that it wasn't "the first snap answer", it wrote half a page internally before giving the same wrong answer.

    • Is that really its internal 'chain of thought' or is it a post-hoc justification generated afterward? Do LLMs have a chain of thought like this at all or are they just convincing at mimicking what a human might say if asked for a justification for an opinion?

      1 reply →