← Back to context

Comment by wilsonzlin

20 hours ago

I've responded to this claim in more detail at [0], with additional context at [1].

Briefly, the project implemented substantial components, including a JS VM, DOM, CSS cascade, inline/block/table layout, paint systems, text pipeline, and chrome, and is not merely a Servo wrapper.

[0] https://news.ycombinator.com/item?id=46655608

Just for context, this was the original claim by Cursor's CEO on Twitter:

> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.

> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.

> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.

https://xcancel.com/mntruell/status/2011562190286045552#m

Could you somewhere make clear exactly how much of the code was "autonomously" built vs how much was steered by humans? Because at this point it's clear that it wasn't 100% autonomous as originally claimed, but right now it's not clear if this was just the work of an engineer running Cursor vs "autonomously organised a fleet of agents".

You're claiming that the JS VM was implemented. Is it actually running? Because this screenshot shows that the ACID3 benchmark is requesting that you enable JavaScript (https://imgur.com/fqGLjSA). Why don't you upload a video of you loading this page?

Your slop is worthless except to convince gullible investors to give you more money.

Does any of it actually work? Can you build that JS VM separately and run serious JS on it? That would be an accomplishment.

Looking at the comments and claims (I've not got the time to review a large code base just to check this claim) I get an impression _something_ was created, but none of it actually builds and no one knows what is the actual plan.

Did your process not involve recursive planning stages (these ALWAYS have big architectural error and gotchas in my experience, unless you're doing a small toy project or something the AI has seen thousands of already).

I find agents doing pretty well once you have a human correct their bad assumptions and architectural errors. But this assumes the human has absolute understanding of what is being done down to the tiniest component. There will be errors agents left to their own will discover at the very end after spending dozens of millions of tokens, then they will try the next idea they hallucinated, spend another few dozen million tokens and so on. Perhaps after 10 iterations like this they may arrive at something fine or more likely they will descent into hallucinations hell.

This is what happens when one of :the complexity, the size, or it being novel enough (often a mix of all 3) of the task exceed the capability of the agents.

The true way to success is the way of a human-ai hybrid, but you absolutely need a human that knows their stuff.

Let me give you a small example from systems field. The other day I wanted to design an AI observability system with the following spec: - use existing OS components, none or as little code as possible - ideally runs on stateless pods on an air gapped k3s cluster (preferably uses one of existing DBs, but clickhouse acceptable) - able to proxy openai, anthropic(both api and clause max), google(vercel+gemini), deepinfra, openrouter including client auth (so it is completely transparent to the client) - reconstruct streaming responses, recognises tool calls, reasoning content, nice to have ability to define own session/conversation recognition rules

I used gemini 3 and opus 4.5 for the initial planning/comparison of os projects that could be useful. Both converged on helicone as being supposedly the best. Until towards the very end of implementation it was found helicone has pretty much zero docs for properly setting up self hosted platform, it tries redirecting to their Web page for auth and agents immediately went into rewriting parts of the source attempting to write their own auth/fixing imaginary bugs that were really miscondiguration.

Then another product was recommended (I forgot which), there upon very detailed questioning, requesting re-confirmations of actual configs for multiple features that were supposedly supported it turned out it didn't pass through auth for clause max.

Eventually I chose litellm+langfuse (that was turned down initially in favour of helicone) and I needed to make few small code changes so Claude max auth could be read, additional headers could be passed through and within a single endpoint it could send Claude telemetry as pure pass through and real llm api through it's "models" engine (so it recognised tool calls and so on).

I cannot make these two statements true at the same time in my head:

> Briefly, the project implemented substantial components, including a JS VM

and from the linked reply:

> vendor/ecma-rs as part of the browser, which is a copy of my personal JS parser project vendored to make it easier to commit to.

If it's using a copy of your personal JS parser that you decided it should use, then it didn't implement it "autonomously". The references you're linking don't summarize to the brief you've provided.

What the fuck is going on?

Did you actually review these implementations and compare them to Servo (and WebKit)? Can you point to a specific part or component that was fully created by the LLM but doesn't clearly resemble anything in existing browser engines?