Comment by exasperaited

2 months ago

This is not a demonstration of a trick question.

This is a demonstration of a system that delusionally refuses to accept correction and correct its misunderstanding (which is a thing that is fundamental to their claim of intelligence through reasoning).

Why would anyone believe these things can reason, that they are heading towards AGI, when halfway through a dialogue where you're trying to tell it that it is wrong it doubles down with a dementia-addled explanation about the two bs giving the word that extra bounce?

It's genuinely like the way people with dementia sadly shore up their confabulations with phrases like "I'll never forget", "I'll always remember", etc. (Which is something that... no never mind)

> Even OSS 20b gets it right the first time, I think the author was just mistakenly routed to the dumbest model because it seemed like an easy unimportant question.

Why would you offer up an easy out for them like this? You're not the PR guy for the firm swimming in money paying million dollar bonuses off what increasingly looks, at a fundamental level, like castles in the sand. Why do the labour?

5 comments

exasperaited

shadowjones 2 months ago

It's a trick question for an artificial intelligence that tokenize words. Humans have plenty of different weaknesses.

>Why would you offer up an easy out for them like this? You're not the PR guy for the firm swimming in money paying million dollar bonuses off what increasingly looks, at a fundamental level, like castles in the sand. Why do the labour?

I deeply hate OpenAI and everything it stands for. But I can't deny the fact that they're +/- dominating the market and releasing SOTA models on a regular basis, trying to understand why and how it fails seems important to not get left behind.

minimaxir 2 months ago

It’s a more difficult question for LLMs due to tokenization, but far from a trick one. There’s no word play or ambiguity involved.

jonnycomputer 2 months ago

the extra bounce was my favorite part!

exasperaited 2 months ago

I mean if it was a Black Mirror satire moment it would rapidly become part of meme culture.
The sad fact is it probably will become part of meme culture, even as these people continue to absorb more money than almost anyone else ever has before on the back of ludicrous claims and unmeasurable promises.

tmnvdb 2 months ago

> This is not a demonstration of a trick question.

It's a question that purposefully uses a limitation of the system. There are many such questions for humans. They are called trick questions. It is not that crazy to call it a trick question.

> This is a demonstration of a system that delusionally refuses to accept correction and correct its misunderstanding (which is a thing that is fundamental to their claim of intelligence through reasoning).

First, the word 'delusional' is strange here unless you believe we are talking about a sentient system. Second, you are just plain wrong. LLMs are not "unable to accept correction" at all, in fact they often accept incorrect corrections (sycophanty). In this case the model is simply unable to understand the correction (because of the nature of the tokenizer) and it is therefore 'correct' behaviour for it to insist on it's incorrect answer.

> Why would anyone believe these things can reason, that they are heading towards AGI, when halfway through a dialogue where you're trying to tell it that it is wrong it doubles down with a dementia-addled explanation about the two bs giving the word that extra bounce?

People believe the models can reason because they produce output consistent with reasoning. (That is not to say they are flawless or we have AGI in our hands.) If you don't agree, provide a definition of reasoning that the model does not meet.

> Why would you offer up an easy out for them like this? You're not the PR guy for the firm swimming in money paying million dollar bonuses off what increasingly looks, at a fundamental level, like castles in the sand. Why do the labour?

This, like many of your other messages, is rather obnoxious and dripping with performative indignation while adding little in the way of substance.