Comment by tyre

1 month ago

I wish your recent interview had pushed much harder on this. It came across as politely not wanting to bring up how poorly this really went, even for what the engineer intended.

They were making claims without the level of rigor to back them up. There was an opportunity to learn some difficult lessons, but—and I don’t think this was your intention—it came across to me as kind of access journalism; not wanting to step on toes while they get their marketing in.

39 comments

tyre

blibble 1 month ago

pushing would definitely stop the supply of interviews/freebies/speaking engagements

moomoo11 1 month ago

Why would he push back? His whole schtick is to sell only AI hype. He’s not going to hurt his revenue.

simonw 1 month ago

If I sell only AI hype why do I keep telling people that many systems built on top of LLMs are inherently insecure? https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
square_usual 1 month ago
That's a great way to tell on yourself that you've never read Simon's work.
- GoatInGrey 1 month ago
  
  On the contrary, we get to read hundreds of his comments explaining how the LLM in anecdote X didn't fail, it was the developer's fault and they should know better than to blame the LLM.
  I only know this because on occasion I'll notice there was a comment from them (I only check the name of the user if it's a hot take) and I ctrl-F their username to see 20-70 matches on the same thread. Exactly 0 of those comments present the idea that LLMs are seriously flawed in programming environments regardless of who's in the driver seat. It always goes back to operator error and "just you watch, in the next 3 months or years...".
  I dunno, I manage LLM implementation consulting teams and I will tell you to your face that LLMs are unequivocally shit for the majority of use cases. It's not hard to directly criticize the tech without hiding behind deflections or euphemisms.
  
  1 reply →
- moomoo11 1 month ago
  
  I literally see their posts every (other) day, and its always glazing something that doesn't fully work (but is kind of cool at a glance) or is really just hyped beyond belief.
  Comments usually point out the issues or more grounded reality.
  BTW I'm bullish on AI, going through 100s of millions of tokens per month.
- blibble 1 month ago
  
  the bare minimum of criticism to allow independence to be claimed?
sealeck 1 month ago

I actually don't think this is true, and certainly of people who cover LLMs Simon Willison is one of the more critical and measured people.

well_ackshually 24 days ago

The person you're responding to isn't a journalist, they're a mouthpiece. Pushing means they don't get these interviews anymore.

The quality of whatever they put out as a result of it is yours to take into consideration.

simonw 1 month ago

I just don't think that's the case.

The claims they made really weren't that extreme. In the blog post they said:

> To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore the source code on GitHub.

> Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.

That's all true.

On Twitter their CEO said:

> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.

> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.

> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.

That's mostly accurate too, especially the "it kind of works" bit. You can take exception to "from-scratch" claim if you like. It's a tweet, the lack of nuance isn't particularly surprising.

In the overall genre of CEO's over-hyping their company's achievements this is a pretty weak example.

I think the people making out that Cursor massively and dishonestly over-hyped this are arguing with a straw man version of what the company representatives actually said.

mikkupikku 1 month ago

> That's mostly accurate too, especially the "it kind of works" bit. You can take exception to "from-scratch" claim if you like. It's a tweet, the lack of nuance isn't particularly surprising.
> In the overall genre of CEO's over-hyping their company's achievements this is a pretty weak example
I kind of agree, but kind of not. The tweet isn't too bad when read from an experienced engineer perspective, but if we're being real then the target audience was probably meant to be technically clueless investors who don't and can't understand the nuance.
mjr00 1 month ago
What people take issue with is the claim that agents built a web browser "from scratch" only to find by looking deeper that they were using Servo, WGPU, Taffy, winit, and other libraries which do most of the heavy lifting.
It's like claiming "my dog filed my taxes for me!" when in reality everything was filled out in TurboTax and your dog clicked the final submit button. Technically true, but clearly disingenuous.
I'm not saying an LLM using existing libraries is a bad thing--in fact I'd consider an LLM which didn't pull in a bunch of existing libraries for the prompt "build a web browser" to be behaving incorrectly--but the CEO is misrepresenting what happened here.
- square_usual 1 month ago
  
  Did you read the comment that started this thread? Let me repeat that, ICYMI:
  > "So I agree this isn't just wiring up of dependencies, and neither is it copied from existing implementations: it's a uniquely bad design that could never support anything resembling a real-world web engine."
  It didn't use Servo, and it wasn't just calling dependencies. It was terribly slow and stupid, but your comment is more of a mischaracterization than anything the Cursor people have said.
  
  3 replies →
- simonw 1 month ago
  
  I agree that "from scratch" is a misrepresentation.
  But it was accompanied by a link to the GitHub repo, so you can hardly claim that they were deliberately hiding the truth.
  
  8 replies →
dns_snek 1 month ago

> I think the people making out that Cursor massively and dishonestly over-hyped this are arguing with a straw man version of what the company representatives actually said.
It's far more dishonest to search for contrived interpretations of their statements in an attempt to frame them as "mostly accurate" when their statements are clearly misleading (and in my opinion, intentionally so).
You're giving them infinite benefit of the doubt where they deserve none, as this industry is well known for intentionally misleading statements, you're brushing off serious factual misrepresentations as simple "lack of nuance" and finally trying to discredit people who have an issue with all of this.
With all due respect, that's not the behavior of a neutral reporter but someone who's heavily invested in maintaining a certain narrative.
dminik 1 month ago
According to the twitter analytics you can see on the post (at least on nitter), the original
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
tweet was seen by over 6 million people.
The follow up tweet which includes the link to the actual details was seen by less than 200000.
That's just how Twitter engagement works and these companies know it. Over 6 million people were fed bullshit. I'm sorry, but it's actually a great example of CEOs over hyping their products.
- simonw 1 month ago
  
  That Tweet that was seen by 6 million people is here: https://x.com/mntruell/status/2011562190286045552
  You only quoted the first line. The full tweet includes the crucial "it kind of works" line - that's not in the follow-up tweet, it's in the original.
  Here's that first tweet in full:
  > We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
  > It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
  > It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
  The second tweet, with only 225,000 views, was just the following text and a link to the GitHub repository:
  > Excited to continue stress testing the boundaries of coding agents and report back on what we learn.
  > Code here: https://github.com/wilsonzlin/fastrender
testdelacc1 1 month ago
The fact that the codebase is meaningless drivel has already been established, you don’t need to defend them. It’s just pure slop, and they’re trying to get people to believe that it’s a working browser. At the time he bragged about that `cargo build` didn’t even run! It was completely broken going back a hundred commits. So it was a complete lie to claim that it “kind of works”.
You have a reputation. You don’t need to carry water for people who are misleading people to raise VC money. What’s the point of you language lawyering about the precise meaning of what he said?
“No no, you don’t get it guys. I’m technically right if you look at the precise wording” is the kind of silly thing I do all the time. It’s not that important to be technically right. Let this one go.
- simonw 1 month ago
  
  Which part of their CEO saying "It kind of works" are you interpreting as "trying to get people to believe that it’s a working browser"?
  The reason I won't let this one go is that I genuinely believe people are being unfair to the engineer who built this, because some people will jump on ANY opportunity to "debunk" stories about AI.
  I won't stand for misleading rhetoric like "it's just a Servo wrapper" when that isn't true.
  
  8 replies →