Comment by danieltanfh95

7 hours ago

I think the discussion has to be more nuanced than this. "LLMs still can't do X so it's an idiot" is a bad line of thought. LLMs with harnesses are clearly capable of engaging with logical problems that only need text. LLMs are not there yet with images, but we are improving with UI and access to tools like figma. LLMs are clearly unable to propose new, creative solutions for problems it has never seen before.

> LLMs are clearly unable to propose new, creative solutions for problems it has never seen before.

LLMs are incredibly useful but I'm not sure about this statement.

It is proposing stuff that I haven't seen before, but I don't know about it is new or creative from the entirety of collective human knowledge.

> LLMs with harnesses are clearly capable of engaging with logical problems that only need text.

To some extent. It's not clear where specifically the boundaries are, but it seems to fail to approach problems in ways that aren't embedded in the training set. I certainly would not put money on it solving an arbitrary logical problem.

  • > To some extent. It's not clear where specifically the boundaries are, but it seems to fail to approach problems in ways that aren't embedded in the training set. I certainly would not put money on it solving an arbitrary logical problem.

    In what way can you falsify this without having the LLM be omniscient? We have examples of it solving things that are not in the training set - it found vulnerabilities in 25 year old BSD code that was unspotted by humans. It was not a trivial one either.

    • Here's an odd example of testing, but I design very complex board and card games, and LLMs are terrible at figuring out whether they make sense or really even restating the rules in a different wording.

      I thought they would be ideal for the job, until I realized that it would just pretend that the rules worked because they looked like board game rules. The more you ask it to restate, manipulate or simulate the rules, the more you can tell that it's bluffing. It literally thinks every complicated set of rules works perfectly.

      > it found vulnerabilities in 25 year old BSD code that was unspotted by humans.

      I don't think the age of the code makes the problem more complex. Finding buffers that are too small is not rocket science, bothering to look at some corner of some codebase that you've never paid attention to or seen a problem with is. AI being infinitely useful (cheap) to sic on pieces of codebase nobody ever carefully looks at is a great thing. It's not genius on the part of the AI.

      2 replies →

  • Solving arbitrary logical problems seems to be equivalent to solving the halting problem so you are probably wise not to make that bet.

> "LLMs still can't do X so it's an idiot"

Let’s be careful. That’s a straw man. I don’t know anyone who says that. Aphyr says in the article that AIs can do things. But they have been marketed as “intelligent,” and I agree with Aphyr that the word is suggesting way more than AIs currently deliver. They do not reason and they do not think and are not truly intelligent. As the article says, they are big wads of linear algebra. Sometimes, that’s useful.

  • > They do not reason

    How do you disprove it?

    • We know that they do not reason because we know the algorithm behind the curtain. The model is generating the next token via model weights and some randomness. That’s all. It not reasoning. Sometimes it has an appearance of reasoning, but not if you know how it works. It doesn’t matter that the model manufacturer marketing department slaps a “Reasoning!” sticker on the side of the model. It’s not actually doing that. As an analogy, sometimes a stage magician in Las Vegas makes it seem that he’s making a woman disappear and a tiger appear in her place, but we all know that’s not what is really happening; It’s just a clever trick.