Comment by shadowjones

2 months ago

I think a lot of those trick questions outputting stupid stuff can be explained by simple economics.

It's just not sustainable for OpenAI to run GPT at the best of its abilities on every request. Their new router is not trying to give you the most accurate answer, but a balance of speed/accuracy/sustainable cost on their side.

(kind of) a similar thing happened when 4o came out, they often tinkered with it and the results were sometimes suddenly a lot worse, it's not that the model is bad, they're just doing all kind of optimizations/tricks because they can barely afford to run it for everyone.

When sama says he believe it to have a PhD level, I almost believe him, because he have full access and can use it at 100% of its power all the time.

Even OSS 20b gets it right the first time, I think the author was just mistakenly routed to the dumbest model because it seemed like an easy unimportant question.

> I think a lot of those trick questions outputting stupid stuff can be explained by simple economics.

> It's just not sustainable for OpenAI to run GPT at the best of its abilities on every request.

So how do I find out whether the answer to my question was run on the discount hardware, or whether it's actually correct?

  • I'd say use the API, search and high reasoning if you want accuracy.

    But then you can partially start to see why it doesn't make economic sense to do this.

    Personally I assume that anything I send through their chat UI will run on the cheapest settings they can get away with.

This is not a demonstration of a trick question.

This is a demonstration of a system that delusionally refuses to accept correction and correct its misunderstanding (which is a thing that is fundamental to their claim of intelligence through reasoning).

Why would anyone believe these things can reason, that they are heading towards AGI, when halfway through a dialogue where you're trying to tell it that it is wrong it doubles down with a dementia-addled explanation about the two bs giving the word that extra bounce?

It's genuinely like the way people with dementia sadly shore up their confabulations with phrases like "I'll never forget", "I'll always remember", etc. (Which is something that... no never mind)

> Even OSS 20b gets it right the first time, I think the author was just mistakenly routed to the dumbest model because it seemed like an easy unimportant question.

Why would you offer up an easy out for them like this? You're not the PR guy for the firm swimming in money paying million dollar bonuses off what increasingly looks, at a fundamental level, like castles in the sand. Why do the labour?

  • It's a trick question for an artificial intelligence that tokenize words. Humans have plenty of different weaknesses.

    >Why would you offer up an easy out for them like this? You're not the PR guy for the firm swimming in money paying million dollar bonuses off what increasingly looks, at a fundamental level, like castles in the sand. Why do the labour?

    I deeply hate OpenAI and everything it stands for. But I can't deny the fact that they're +/- dominating the market and releasing SOTA models on a regular basis, trying to understand why and how it fails seems important to not get left behind.

    • It’s a more difficult question for LLMs due to tokenization, but far from a trick one. There’s no word play or ambiguity involved.

  • the extra bounce was my favorite part!

    • I mean if it was a Black Mirror satire moment it would rapidly become part of meme culture.

      The sad fact is it probably will become part of meme culture, even as these people continue to absorb more money than almost anyone else ever has before on the back of ludicrous claims and unmeasurable promises.

  • > This is not a demonstration of a trick question.

    It's a question that purposefully uses a limitation of the system. There are many such questions for humans. They are called trick questions. It is not that crazy to call it a trick question.

    > This is a demonstration of a system that delusionally refuses to accept correction and correct its misunderstanding (which is a thing that is fundamental to their claim of intelligence through reasoning).

    First, the word 'delusional' is strange here unless you believe we are talking about a sentient system. Second, you are just plain wrong. LLMs are not "unable to accept correction" at all, in fact they often accept incorrect corrections (sycophanty). In this case the model is simply unable to understand the correction (because of the nature of the tokenizer) and it is therefore 'correct' behaviour for it to insist on it's incorrect answer.

    > Why would anyone believe these things can reason, that they are heading towards AGI, when halfway through a dialogue where you're trying to tell it that it is wrong it doubles down with a dementia-addled explanation about the two bs giving the word that extra bounce?

    People believe the models can reason because they produce output consistent with reasoning. (That is not to say they are flawless or we have AGI in our hands.) If you don't agree, provide a definition of reasoning that the model does not meet.

    > Why would you offer up an easy out for them like this? You're not the PR guy for the firm swimming in money paying million dollar bonuses off what increasingly looks, at a fundamental level, like castles in the sand. Why do the labour?

    This, like many of your other messages, is rather obnoxious and dripping with performative indignation while adding little in the way of substance.