← Back to context

Comment by SkiFire13

21 days ago

> It shows a remarkably consistent curve for AI completing increasingly difficult coding tasks over time.

I'm not convinced that "long" is equivalent to "difficult". Traditional computer can also solve tasks that would take extemely long for humans, but that doesn't make them intelligent.

This is not to say that this is useless, quite the opposite! Traditional computers shown that being able to shorten the time needed for certain tasks is extremely valuable, and AI shown this can be extender to other (but not necessarily all) tasks as well.

That's true, but length is a good proxy for three of the biggest difficulties faced by LLMs when coding:

1. Ability to take large amounts of information into consideration, specifically large codebases (longer tasks usually involve larger codebases). LLMs struggle with this due to context window limitations.

2. Ability to make and execute on long-term plans. Also related to context window limitations, as well as what for a human would be called "executive functioning skills".

3. Consistency. If you have an x% chance to get stuck on each step of a multi-step task, then the more steps, the higher the failure rate. This is true for both LLMs and humans, but LLMs tend to have more random failures, both due to hallucinations and due to being worse at recovering if their initial attempt fails (they can have a hard time remembering what they're supposed to do differently).

These difficulties seem to generalize beyond coding to almost any kind of knowledge work. A system that could solve them all would be, if not AGI, at least a heck of a lot closer.