← Back to context

Comment by comex

21 days ago

That's true, but length is a good proxy for three of the biggest difficulties faced by LLMs when coding:

1. Ability to take large amounts of information into consideration, specifically large codebases (longer tasks usually involve larger codebases). LLMs struggle with this due to context window limitations.

2. Ability to make and execute on long-term plans. Also related to context window limitations, as well as what for a human would be called "executive functioning skills".

3. Consistency. If you have an x% chance to get stuck on each step of a multi-step task, then the more steps, the higher the failure rate. This is true for both LLMs and humans, but LLMs tend to have more random failures, both due to hallucinations and due to being worse at recovering if their initial attempt fails (they can have a hard time remembering what they're supposed to do differently).

These difficulties seem to generalize beyond coding to almost any kind of knowledge work. A system that could solve them all would be, if not AGI, at least a heck of a lot closer.