Comment by felipeerias

1 month ago

IMHO people are missing the forest for the trees. The point of this experiment is not to build a functional browser but to develop ways to make agents create large codebases from scratch over a very long time span. A Web browser is just a convenient target because there are lots of documentation, specs and tests available.

6 comments

felipeerias

saghm 1 month ago

The point is to learn how to make very large codebases that don't compile? Why do you need tests and specs if it's not going to even run, much less run correctly?

felipeerias 1 month ago
As discussed elsewhere, it is apparently possible to compile and run this particular project. It seems that whatever process they followed allows commits to break the build pretty often.
Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.
- embedding-shape 1 month ago
  
  > As discussed elsewhere, it is apparently possible to compile and run this particular project.
  After a human stepped in to fix it, yes. You can see it yourself here: https://github.com/wilsonzlin/fastrender/issues/98
  > Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.
  But that's not what they demonstrated here. What they demonstrated, so far, is that you can let agents write millions of lines of code, and eventually if you actually need to run it, some human need to "merge the latest snapshot" or do some other management to actually put together the system into a workable state.
  Very different from what their original claims were.

noodletheworld 1 month ago

...but it didn't develop ways of doing that did it?

Any idiot can have cursor run for 2 weeks and produce a pile of crap that doesn't compile.

You know the brilliant insight they came out with?

> A surprising amount of the system's behavior comes down to how we prompt the agents. Getting them to coordinate well, avoid pathological behaviors, and maintain focus over long periods required extensive experimentation. The harness and models matter, but the prompts matter more.

i.e. It's kind of hard and we didn't really come up with a better solution than 'make sure you write good prompts'.

Wellll, geeeeeeeee! Thanks for that insight guys!

Come on. This was complete BS. Planners and workers. Cool. Details? Any details? Annnnnnnyyyyy way to replicate it? What sort of prompts did you use? How did you solve the pathalogical behaviours?

Nope. The vagueness in this post... it's not an experiment. It's just fund raising hype.

chrisandchris 1 month ago
IMHO, this whole thing could be read with "human" instread of "agent" and would make the exact same amount of sense.
"We put 200 human in a room and gave them instructions how to build a browser. They coded for hours, resolving merge conflicts and producing code that did not build in the end without intervention of seniors []. We think, giving them better instructions leads to better results"
So they actually invented humans? And will it come down to either "managing humans" or "managing agents"? One of both will be more reliable, more predictable and more convenient to work with. And my guess is, it is not an agent...
As it seemed in the git log, something is weird.