Comment by vidarh

7 months ago

It doesn't stop them from making stupid mistakes. It does reduce the amount of time I have to deal with the stupid mistakes that they know how to fix if the problem is pointed out to them, so that I can focus on more focused diffs of cleaner code.

E.g. a real example: The tooling I mentioned at one point early on made the correct functional change, but it's written in Ruby and Ruby allows defining methods multiple times in the same class - the later version just overrides the former. This would of course be a compilation error in most other languages. It's a weakness of using Ruby with a careless (or mindless) developer...

But Rubocop - a linter - will catch it. So forcing all changes through Rubocop and just returning the errors to LLM made it recognise the mistake and delete the old method.

It lowers the cognitive load of the review. Instead of having to wade through and resolve a lot of cruft and make sense of unusually structured code, you can focus on the actual specific changes and subject those to more scrutiny.

And then my plan is to experiment with more semantic checks of the same style as what Rubocop uses, but less prescriptive, of the type "maybe you should pay extra attention here, and explain why this is correct/safe" etc. An example might be to trigger this for any change that involves reading a key or password field or card number whether or not there is a problem with it, and both trigger the LLM to "look twice" and indicate it as an area to pay extra attention to in a human review.

It doesn't need to be perfect, it just need to provide enough of a harness to make it easier for humans in the loop to spot the remaining issues.

Right, so you understand that any dev who already uses for example Github Copilot with various code syntax extensions already achieves whatever it is that your new service is delivering? I'd spare myself the effort if I were you.

  • It didn't start with the intent of being a service; I started with it because there were a number of things that Copilot or tools like Claude Code doesn't do well enough that annoyed me, and spending a few hours was sufficient to get to the point where it's now my primary coding assistant because it works better for me for my stack, and because I can evolve it further to solve the specific problems I need solved.

    So, no, I'll keep doing this because doing this is already saving me effort for my other projects.