Comment by paulus_magnus2
1 day ago
The blog[0] is worded rather conservatively but on Twitter [2] the claim is pretty obvious and the hype effect is achieved [2]
CEO stated "We built a browser with GPT-5.2 in Cursor"
instead of
"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"
[0] https://cursor.com/blog/scaling-agents
[1] https://x.com/kimmonismus/status/2011776630440558799
[2] https://x.com/mntruell/status/2011562190286045552
[3]https://www.reddit.com/r/singularity/comments/1qd541a/ceo_of...
Even then, "resolving merge conflicts along the way" doesn't mean anything, as there are two trivial merge strategies that are always guaranteed to work ('ours' and 'theirs').
Haha. True, CI success was not part of PR accept criteria at any point.
If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline
If I had a nickel for every time I've seen a human dev disable/xfail/remove a failing test "because it's wrong" and then proceeding to break production I would have several nickels, which is not much, but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.
21 replies →
We use claude code a lot for updating systems to a newer minor/major version. We have our own 'base' framework for clients which is a, by now, very large codebase that does 'everything you can possibly need'; so not only auth, but payments, billing, support tickets, email workflows, email wysiwyg editing, landing page editor, blogging, cms, AI /agent workflows etc etc (across our client base, we collect features that are 'generic' enough and create those in the base). It has many updates for the product lead working on it (a senior using Claude code) but we cannot just update our clients (whose versions are sometimes extremely customised/diverging) at the same pace; some do not want updates outside security, some want them once a year etc. In this case AI has been really a productivity booster; our framework always was quite fast moving before AI too when we had 3.5 FTE (client teams are generally much larger, especially the first years) on it but then merging, that to mean; including the new features and improvements in the client version that are in the new framework version without breaking/removing changes on the client side, was a very painful process taking a lot of time and at at least 2 people for an extended period of time; one from the client team, one from the framework team. With CC it is much less painful: it will merge them (it is not allowed, by hooks, to touch the tests), it will run the client tests and the new framework tests and report the difference. That difference is evaluated usually by someone from the client team who will then merge and fix the tests (mostly manually) to reflect the new reality and test the system manually. Claude misses things (especially if functionalities are very similar but not exactly the same, it cannot really pick which to take so it does nothing usually) but the biggest bulk/work is done quickly and usually without causing issues.
that’s not guaranteed to work. Other parts of the CodeBase that didn’t conflict could depend on the discarded code.
The point is that the merge conflict was resolved, regardless of whether there was a working product at the end. Which there apparently isn’t.
Well they did mention the code doesn't work.
2 replies →
[dead]
So, AI agent battle royale
So clearly someone, at some point, managed to run this, surely? That's where the screenshots come from? I just don't understand how, given the code is riddled with errors.
Somebody managed to get it to compile https://x.com/CanadaHonk/status/2011612084719796272
But apparently "some pages take a literal minute to load"
> to be clear those 2 hours were fixing compile errors and bugs, not compile time
Seems like "I had to do the last mile myself", not "autonomous coding" which was Cursor's claim here.
Maybe they just asked an AI to create an image of a rendered webpage?
The link [0] implies that the browser worked. Can you help me understand what's "conservative" about that?
> Can you help me understand what's "conservative" about that?
It's the gaslighting.