Comment by accountnum

1 year ago

My point isn't that the model falls for gender stereotypes, but that it falls for thinking that it needs to solve the unmodified riddle.

Humans fail at the original because they expect doctors to be male and miss crucial information because of that assumption. The model fails at the modification because it assumes that it is the unmodified riddle and misses crucial information because of that assumption.

In both cases, the trick is to subvert assumptions. To provoke the human or LLM into taking a reasoning shortcut that leads them astray.

You can construct arbitrary situations like this one, and the LLM will get it unless you deliberately try to confuse it by basing it on a well known variation with a different answer.

I mean, genuinely, do you believe that LLMs don't understand grammar? Have you ever interacted with one? Why not test that theory outside of adversarial examples that humans fall for as well?

They don't understand basic math or basic logic, so I don't think they understand grammar either.

They do understand/know the most likely words to follow on from a given word, which makes them very good at constructing convincing, plausible sentences in a given language - those sentences may well be gibberish or provably incorrect though - usually not because again most sentences in the dataset make some sort of sense, but sometimes the facade slips and it is apparent the GAI has no understanding and no theory of mind or even a basic model of relations between concepts (mother/father/son).

It is actually remarkable how like human writing their output is given how it is done, but there is no model of the world which backs their generated text which is a fatal flaw - as this example demonstrates.