Comment by simonw

3 days ago

Honestly, grilling him about what the CEO had tweeted didn't even cross my mind.

I wanted to get to the truth of what had actually been built and how. If that contradicts what the CEO said then great, the truth is now out there - anyone is free to call that out and use my video as ammunition.

I just had a look to see what Michael Truell had said about the project, here it is: https://x.com/mntruell/status/2011562190286045552

> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.

> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.

> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.

This doesn't strike me as the world's most dishonest tweet, though it exaggerates what was achieved. There IS a JS VM in there but it's feature-flagged off. The from-scratch is misleading because there are libraries handling certain aspects - most notably Taffy - which we discussed in the interview.

I just ran "cloc" and to my surprise it counted 3,036,403 (I had thought the 3M was an exaggeration) though only 1,658,651 of that was Rust.

"It kind of works" is a fair assessment IMO!

I don't think "Let's talk about your CEO exaggerating what you built on Twitter" would have added much to the interview.

I did make sure to go over the controversies I thought were material to the project, which is why I dug into the dependencies and talked about QuickJS and Taffy.

> Honestly, grilling him about what the CEO had tweeted didn't even cross my mind.

That's not the full meaning of what I meant either, I'm assuming you also read the initial blog post they posted? It's also has a bunch of similar inaccurate statements.

> I wanted to get to the truth of what had actually been built and how.

It's a shame that you seemed to have reviewed that from the point of after a human stepped in to fix the codebase, which happened way after they first published the blog post. Maybe now it compiles and builds, but how does that answer to the fact that it wasn't at the time of publishing?

There is a "hole" of two days without commits, presumably when the engineer was busy writing the blog post, and that's the point they "sold" as "this is what was produced by the experiment". To then let them spend more human engineering time to patch the codebase, and review if from after the human fixed it, seems like completely missing the point.

> I don't think "Let's talk about your CEO exaggerating what you built on Twitter" would have added much to the interview.

What would have added a whole lot more to the ecosystem's understanding on how feasible this sort of things actually are in reality, would have been to talk about what that same person you interviewed first wrote in the blog post, and what turned out to be real at the time they published it.

  • The blog post had just a couple of paragraphs about the browser project, all of them accurate: https://cursor.com/blog/scaling-agents#running-for-weeks

    > To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore the source code on GitHub.

    > Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.

    The commits that knocked the project into shape so other people could build the code were handled by agents as well.

    I really don't think there's a notable scandal here.

    • I don't think there is a "scandal" here neither, companies lie and exaggerate all the time and it's becoming normalized. With that said, I think it's important to record when it happens and exactly how it happens, because not only does it help people in the future to know what to look out for, it also serves as a historical record to refer to when you start to see repeating patterns.

      Agree to disagree about "all of them accurate", I've already made my case elsewhere and doesn't really help anyone to re-iterate here what's public already.

      1 reply →