Comment by deng

1 month ago

Even then, "resolving merge conflicts along the way" doesn't mean anything, as there are two trivial merge strategies that are always guaranteed to work ('ours' and 'theirs').

36 comments

deng

paulus_magnus2 1 month ago

Haha. True, CI success was not part of PR accept criteria at any point.

If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline

formerly_proven 1 month ago
If I had a nickel for every time I've seen a human dev disable/xfail/remove a failing test "because it's wrong" and then proceeding to break production I would have several nickels, which is not much, but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.
- vizzier 1 month ago
  
  > but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.
  True, but it is shocking how often claude suggests just disabling or removing tests.
  
  12 replies →
- dullcrisp 1 month ago
  
  If I had a nickel for every time I’ve seen a human being pull down their pants and defecate in the middle of the street I’d have a couple nickels. That’s not a lot but it suggests that this behavior is not LLM specific.
- Tade0 1 month ago
  
  If anything, the LLMs had to learn that from somewhere, so they're just copying human behaviour.
  
  1 reply →
- teiferer 1 month ago
  
  Where I work, that's exceptionally rare to the point practically non-existing.
- mickdarling 1 month ago
  
  Had humans not been doing this already, I would have walked into Samsung with the demo application that was working an hour before my meeting, rather than the android app that could only show me the opening logo.
  There are a lot of really bad human developers out there, too.
  
  5 replies →

anonzzzies 1 month ago

We use claude code a lot for updating systems to a newer minor/major version. We have our own 'base' framework for clients which is a, by now, very large codebase that does 'everything you can possibly need'; so not only auth, but payments, billing, support tickets, email workflows, email wysiwyg editing, landing page editor, blogging, cms, AI /agent workflows etc etc (across our client base, we collect features that are 'generic' enough and create those in the base). It has many updates for the product lead working on it (a senior using Claude code) but we cannot just update our clients (whose versions are sometimes extremely customised/diverging) at the same pace; some do not want updates outside security, some want them once a year etc. In this case AI has been really a productivity booster; our framework always was quite fast moving before AI too when we had 3.5 FTE (client teams are generally much larger, especially the first years) on it but then merging, that to mean; including the new features and improvements in the client version that are in the new framework version without breaking/removing changes on the client side, was a very painful process taking a lot of time and at at least 2 people for an extended period of time; one from the client team, one from the framework team. With CC it is much less painful: it will merge them (it is not allowed, by hooks, to touch the tests), it will run the client tests and the new framework tests and report the difference. That difference is evaluated usually by someone from the client team who will then merge and fix the tests (mostly manually) to reflect the new reality and test the system manually. Claude misses things (especially if functionalities are very similar but not exactly the same, it cannot really pick which to take so it does nothing usually) but the biggest bulk/work is done quickly and usually without causing issues.

fzzzy 1 month ago

that’s not guaranteed to work. Other parts of the CodeBase that didn’t conflict could depend on the discarded code.

madeofpalk 1 month ago

The point is that the merge conflict was resolved, regardless of whether there was a working product at the end. Which there apparently isn’t.
formerly_proven 1 month ago
Well they did mention the code doesn't work.
- nyeah 1 month ago
  
  Where did Cursor say that?
  
  3 replies →
dingnuts 1 month ago

[dead]

efreak 1 month ago

`git add .; git merge continue` also "solves" the conflict

PunchyHamster 1 month ago

So, AI agent battle royale