← Back to context

Comment by embedding-shape

3 days ago

Did anyone manage to run the tests from the repository itself? The code seems filled with errors and warnings, as far as I can tell none of them because of the platform I'm on (Linux). I went and looked at the Action workflow history for some pages, and seems CI been failing for a while, PRs also all been failing CI but merged. How exactly was this verified to be something to be used as an successful example, or am I misunderstanding what point they are trying to make? They mention a screenshot, but they never actually mention if their goal was successfully met, do they?

I'm not sure the approach of "completely autonomous coding" is the right way to go. I feel like maybe we'll be able to use it more effectively if we think of them as something to be used by a human to accomplish some thing instead, lean into letting the human drive the thing instead, because quality spirals so quickly out of control.

I found the codebase very hard to navigate. Hundreds (over a thousand?) tiny files with less than 200 lines of code, in deeply nested subdirectories. I wanted to find where the JavaScript engine was, and where the DOM implementation was located, and I couldn't easily find it, even using the GitHub search feature. I'm not exactly sure what this browser implements and how.

Even their README is kind of crappy. Ideally you want installation instructions right near the top, but it's broken into multiple files. The README link that says "running + architecture" (but the file is actually called browser_ui.md???) is hard to follow. There is no explicit list of dependencies, and again no explanation of how JavaScript execution works, or how rendering works, really.

It's impressive that they got such a big project to be built by agents and to compile, but this codebase... Feels like AI slop, and you couldn't pay me to maintain it. You could try to get AI agents to maintain it, but my prediction is that past some scale, they would have a hard time figuring out their own mess. You would just be left with permanent bugs you can't easily fix.

  • So the chain of events here is: copy existing tutorials and public/available code, train the model to spit it out-ish when asked, a mature-ish specification is used, and now they jitter and jumble towards a facsimile of a junior copy paste outsourcing nightmare they can’t maintain (creating exciting liabilities for all parties involved).

    I can’t shake the feeling that simply being a shameless about copy-paste (ie copyright infringement), would let existing tools do much the same faster and more efficiently. Download Chromium, search-replace ‘Google’ with ‘ME!’, run Make… if I put that in a small app someone would explain that’s actually solvable as a bash one-liner.

    There’s a lot of utility in better search and natural language interactions. The siren call of feedback loops plays with our sense of time and might be clouding or sense of progress and utility.

    • You raise a good point, which is that autonomous coding needs to be benchmarked on designs/challenges where the exact thing being built isn't part of the model's training set.

      1 reply →

  • To steelman the vibecoders’ perspective, I think the point is that the code is not meant for you to read.

    Anyone who has looked at AI art, read AI stories, listened to AI music, or really interacted with AI in any meaningfully critical way would recognize that this was the only predictable result given the current state of AI generated “content”. It’s extremely brittle, and collapses at the smallest bit of scrutiny.

    But I guess (to continue steelmanning) the paradigm has shifted entirely. Why do we even need an entire browser for the whole internet? Why can’t we just vibe code a “browser” on demand for each web page we interact with?

    I feel gross after writing this.

    • If it's not meant to be read, and not meant to be run since it doesn't compile and doesn't seem like it's been able to for quite some time, what is this mean to demonstrate?

      That agents can write a bunch of code by themselves? We already knew that, and what's even the point of that if the code doesn't work?

      I feel like I'm still missing what this entire project and blogpost is about. Is it supposed to be all theoretical or what's the deal?

      2 replies →

  • > It's impressive that they got such a big project to be built by agents and to compile

    But that's the thing, it doesn't compile, has a ton of errors, CI seems broken since long... What exactly is supposed to impressive here, that it managed to generate a bunch of code that doesn't even compile?

    What in the holy hackers is this even about? Am I missing something obvious here? How is this news?

> I'm not sure the approach of "completely autonomous coding" is the right way to go.

I suspect the author of the post would agree. This feels much more like a experiment to push the limits of LLMs than anything they're looking to seriously use as a product (or even the basis of a product).

I think the more interesting question is when the approach of completely autonomous coding will be the right way to go. LLMs are definitely progressing along a spectrum of: Can't do it -> Can do it with help -> Can do it alone but code isn't great -> Can do it alone with good code. Right now I'd say they're only in that final step for very small projects (e.g. simple Python scripts), but it seems like an inevitability that they will get there for increasingly large ones.

You can stop reading the article from here:

> Today's agents work well for focused tasks, but are slow for complex projects.

What does slow mean? Slower than humans? Need faster GPUs? What does it even imply? Too slow to produce the next token? Too slow in attempts to be usable? Need human intervention?

This piece is made and written to keep the bubble inflating further.

Code filled with errors and warnings? PR's merged with failing CI?

So I guess they've achieved human parity then?

(I'll see myself out)