Comment by sealeck

20 days ago

I think the other question is how far away this is from a "working" browser. It isn't impossible to render a meaningful subset of HTML (especially when you use external libraries to handle a lot of this). The real difficulty is doing this (a) quickly, (b) correctly and (c) securely. All of those are very hard problems, and also quite tricky to verify.

I think this kind of approach is interesting, but it's a bit sad that Cursor didn't discuss how they close the feedback loop: testing/verification. As generating code becomes cheaper, I think effort will shift to how we can more cheaply and reliably determine whether an arbitrary piece of code meets a desired specification. For example did they use https://web-platform-tests.org/, fuzz testing (e.g. feed in random webpages and inform the LLM when the fuzzer finds crashes), etc? I would imagine truly scaling long-running autonomous coding would have an emphasis on this.

Of course Cursor may well have done this, but it wasn't super deeply discussed in their blog post.

I really enjoy reading your blog and it would be super cool to see you look at approaches people have to ensuring that LLM-produced code is reliable/correct.

7 comments

sealeck

simonw 20 days ago

Yeah, I'm hoping they publish a lot more about this project! It deserves way more then the few sentences they've shared about it so far.

cousinbryce 19 days ago

I’m interested to see how much more they know about the project

polyglotfacto 19 days ago

I think the current approach is simply not scalable to a working browser ever.

To leverage AI to build a working browser you would imo need the following:

- A team of humans with some good ideas on how to improve on existing web engines.

- A clear architectural story written not by agents but by humans. Architecture does not mean high-level diagrams only. At each level of abstraction, you need humans to decide what makes sense and only use the agent to bang out slight variations.

- A modular and human-overseen agentic loop approach: one agent can keep running to try to fix a specific CSS feature(like grid), with a human expert reviewing the work at some interval(not sure how fine-grained it should be). This is actually very similar to running an open-source project: you have code owners and a modular review process, not just an army of contributor committing whatever they want. And a "judge agent" is not the same thing as a human code owner as reviewer.

Example on how not to do it: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...

This rendering loop architecture makes zero sense, and it does not implement web standards.

> in the HTML Standard, requestAnimationFrame is part of the frame rendering steps (“update the rendering”), which occur after running a task and performing a microtask checkpoint

> requestAnimationFrame callbacks run on the frame schedule, not as normal tasks.

This is BS: "update the rendering" is specified as just another task, which means it needs to be followed by a microtask checkpoint. See https://html.spec.whatwg.org/multipage/#event-loop-processin...

Following the spec doesn't mean you cannot optimize rendering tasks in some way vs other tasks in your implementation, but the above is not that, it's classic AI bs.

Understanding Web standards and translating them into an implementation requires human judgement.

Don't use an agent to draft your architecture; an expert in web standards with a interest in agentic coding is what is required.

Message to Cursor CEO: next time, instead of lighting up those millions on fire, reach out to me first: https://github.com/gterzian

ontouchstart 18 days ago
How much effort would it take GenAI to write a browser/engine from scratch for GenAI to consume (and generate) all the web artifacts generated by human and GenAI? (This only needs to work in headless CI.)
How much effort would it take for a group of humans to do it?
- polyglotfacto 16 days ago
  
  I'm not sure about what you mean with your first sentence in terms of product.
  But in general, my guess at an answer(supported by the results of the experiment discussed on this thread), is that:
  - GenAi left unsupervised cannot write a browser/engine, or any other complex software. What you end-up with is just chaos.
  - A group of humans using GenAi and supervising it's output could write such an engine(or any other complex software), and in theory be more productive than a group of humans not using GenAi: the humans could focus on the conceptual bottlenecks, and the Ai could bang-out the features that require only the translation of already established architectural patterns.
  When I write conceptual bottlenecks I don't mean standing in front of a whiteboard full of diagrams. What I mean is any work the gives proper meaning and functionality to the code: it can be at the level of an individual function, or the project as a whole. It can also be outside of the code itself, such as when you describe the desired behavior of (some part of) a program in TLA+.
  For an example, see: https://medium.com/@polyglot_factotum/on-writing-with-ai-87c...
  
  2 replies →