← Back to context

Comment by trjordan

3 days ago

This is going to sound sarcastic, but I mean this fully: why haven't they merged that PR.

The implied future here is _unreal cool_. Swarms of coding agents that can build anything, with little oversight. Long-running projects that converge on high-quality, complex projects.

But the examples feel thin. Web browsers, Excel, and Windows 7 exist, and they specifically exist in the LLM's training sets. The closest to real code is what they've done with Cursor's codebase .... but it's not merged yet.

I don't want to say, call me when it's merged. But I'm not worried about agents ability to produce millions of lines of code. I'm worried about their ability to intersect with the humans in the real world, both as users of that code and developers who want to build on top of it.

> This is going to sound sarcastic, but I mean this fully: why haven't they merged that PR.

I would go even further, why have they not created at least one less complex project that is working and ready to be checked out? To me it sounds like having a carrot dangle in front of the face of VC investors: 'Look, we are almost there to replace legions of software developers! Imagine the market size and potential cost reductions for companies.'

LLMs are definitely an exciting new tool and they are going to change a lot. But are they worth $B for everything being stamped 'AI'? The future will tell. Looking back the dotcom boom hype felt exactly the same.

The difference with the dotcom boom is that at the time there was a lot more optimism to build a better future. The AI gold rush seems to be focused on getting giga-rich while fscking the bigger part of humanity.

>> why haven't they merged that PR.

because it is absolutely impossible to review that code and there is gazillion issues there.

The only way it can get merged is YOLO and then fix issues for months in prod which kinda defeats the purpose and brings gains close to zero.

> Long-running projects that converge on high-quality, complex projects

In my experience agents don't converge on anything. They diverge into low-quality monstrosities which at some point become entirely unusable.

  • Yeah, I don't think they're built for that either, you need a human to steer the "convergtion", otherwise they indeed end up building monstrosities.

> Web browsers, Excel, and Windows 7 exist, and they specifically exist in the LLM's training sets.

There's just a bit over 3 browsers, 1 serious excel-like and small part of windows user side. That's really not enough for training for replicating those specific tasks.

> Long-running projects that converge

This is how I think about it. I care about asymptotics. What initial conditions (model(s) x workflow/harness x input text artefacts) causes convergence to the best steady state? The number of lines of code doesn't have to grow, it could also shrink. It's about the best output.

Pretty much everything exists in the training sets. All non-research software is just a mishmash of various standard modules and algorithms.

  • Not everything, only code-bases of existing (open-source?) applications.

    But what would be the point of re-creating existing applications? It would be useful if you can produce a better version of those applications. But the point in this experiment was to produce something "from scratch" I think. Impressive yes, but is it useful?

    A more practically useful task would be for Mozilla Foundation and others to ask AI to fix all bugs in their application(s). And perhaps they are trying to do that, let's wait and see.

    • You have to be careful which codebase to try this on. I have a feeling if someone unleashed agents on the Linux kernel to fix bugs it'd lead to a ban on agents there

    • Re-creating closed source applications as open source would have a clear benefit because people could use those applications in a bunch of new ways. (implied: same quality bar)