Comment by rbalicki

1 day ago

I'll add another use case for letting an AI go ham: many small, atomic refactors where the name of the game is never breaking anything.

My personal OSS projects don't have the scale to necessarily make this worth it, but at work I run three pipelines using Barnum (https://barnum-circus.github.io/). First, one that ingests files, identifies refactors (from a pre-approved list), and places a precise description of the refactor to be done in a queue; second, one that reads from said queue, implements and creates PRs (there is a lot of "check that the PR is correct" here as well); and a third that babysits PRs until they land. I've landed hundreds of PRs in this way, with very little effort on my part.

25 comments

rbalicki

dmzxnico 18 hours ago

It's amazing at reverse, see what they do on GTA San Andreas now, they started the reverse before AI existed, since AI is in their hands, reversed sped up so much that they can finally understand the game deeper, create bigger mods, added Vice City inside the game in an Arcade, they created specific tools made with AI to convert GTA 5 models to GTA SA. Pretty crazy and great.

port11 6 hours ago

My experience with Gemini and Sonnet are that refactors or TypeScript compilation errors can be solved by “have at it”, but with mixed results. Many TS issues go away with `as any/never`, and instructing the model to not do that doesn’t work very well.

frizlab 1 day ago

I recently in $COMPANY had a coworker try fable to do a refactor where not breaking anything was the game.

It broke something at the first PR.

I think we’re not there yet.

rbalicki 7 hours ago
Speculating here, but perhaps your coworker was too ambitious? In my opinion, you should start with AI-generated PRs that do small, linting refactors and then work up from there. In particular, if this is done in parts, one of the strategies you can employ is to: - add tests - break files up into smaller parts - test the smaller parts - then actually improve behavior
(Which is no different than what you would do as a human)
- frizlab 4 hours ago
  
  PR wasn’t big (+283/-232) and was indeed focused on a single module.
Schiendelman 11 hours ago
One of the best things you can do is start by having it do unit test coverage for existing behavior. A refactor with no tests breaks things pretty much no matter who does it, because they don't know what the right behavior is.
- frizlab 3 hours ago
  
  While I could generally agree, in this specific instance if the AI were “thinking” correctly it should have found the mistake. I admit it was a difficult problem though (solving it required creativity).
  To be more precise, the prompt actually pointed to where there could be issues, and the issue, which was exactly of the kind that was pointed at, was not found.
sunrunner 1 day ago
I've found that adding "Make no mistakes." to my prompt usually helps with this kind of problem...
- cubano 1 day ago
  
  perhaps simply threatening to fire it would also do the trick...it sure has worked well on us for a long time now.
  
  6 replies →
- dozerly 1 day ago
  
  We are so many layers deep in AI hype that I honestly can’t tell if this is /s or not
  
  6 replies →
- lemming 1 day ago
  
  Or if the code is really important, sometimes even “please make no mistakes” is necessary.
  
  1 reply →