Comment by vitally3643

1 day ago

That's because you're treating the problem as an engineer instead of an "influencer" or "10xer" or whatever. You're treating it as a problem to be solved with engineering and AI is merely a tool to do so. It is, in my experience, vanishingly rare for an engineer to have a problem that needs to be solved with multiple hours of unattended AI code generation.

I've only found one single application where it makes even the slightest amount of sense to have an AI grind away for hours on end. I'm reverse engineering a widget which contains five separate firmware images. I've dumped the binary from the widget and I set the AI to decompile and reverse engineer these interrelated firmware projects. It's a compelx task, but very well bounded. It's not complicated work, but it's a lot of work, and the end result is a C-shaped pile of text that is only informative, it never would be compilable on its own even if I did it by hand. The quality of the output is tightly bounded by the input assembly and the overall output artifact is documentation in the shape of code.

I don't have any qualms about letting an AI go ham on it unattended because the stakes are zero. But if the AI can beat the assembly into a recognizable C project, it's much easier for me to read and reason about. Easy win, I think.

I'll add another use case for letting an AI go ham: many small, atomic refactors where the name of the game is never breaking anything.

My personal OSS projects don't have the scale to necessarily make this worth it, but at work I run three pipelines using Barnum (https://barnum-circus.github.io/). First, one that ingests files, identifies refactors (from a pre-approved list), and places a precise description of the refactor to be done in a queue; second, one that reads from said queue, implements and creates PRs (there is a lot of "check that the PR is correct" here as well); and a third that babysits PRs until they land. I've landed hundreds of PRs in this way, with very little effort on my part.

  • My experience with Gemini and Sonnet are that refactors or TypeScript compilation errors can be solved by “have at it”, but with mixed results. Many TS issues go away with `as any/never`, and instructing the model to not do that doesn’t work very well.

  • It's amazing at reverse, see what they do on GTA San Andreas now, they started the reverse before AI existed, since AI is in their hands, reversed sped up so much that they can finally understand the game deeper, create bigger mods, added Vice City inside the game in an Arcade, they created specific tools made with AI to convert GTA 5 models to GTA SA. Pretty crazy and great.

  • I recently in $COMPANY had a coworker try fable to do a refactor where not breaking anything was the game.

    It broke something at the first PR.

    I think we’re not there yet.

    • Speculating here, but perhaps your coworker was too ambitious? In my opinion, you should start with AI-generated PRs that do small, linting refactors and then work up from there. In particular, if this is done in parts, one of the strategies you can employ is to: - add tests - break files up into smaller parts - test the smaller parts - then actually improve behavior

      (Which is no different than what you would do as a human)

      1 reply →

    • One of the best things you can do is start by having it do unit test coverage for existing behavior. A refactor with no tests breaks things pretty much no matter who does it, because they don't know what the right behavior is.

      1 reply →

How do you keep the info the AI generates concise?

I'm grappling with this at the moment, getting it to do design or reverse engineering work, during investigation it makes the wall of text bigger rather than consolidating. It can never pause and create abstractions properly. This is on Opus which starts getting wordy and performative on goals it can't easily verify.

  • Not the person you replied to, but I find that the process involves a steady stream of nudges and fixes to the workflow, plugging the gaps as they come along, until the rate of errors shrinks to an acceptable level.

    You may benefit from adding instructions like:

    - Be concise, especially when X

    - Do Y in this manner: [provide specific template or reference here]

    - When doing X, do Y and Z

    - If you notice issues, bring them to my attention instead of skipping past them.

    You can also add specific templates to assist certain stages. The more guardrails or bounding you can provide, the better. Start with small nudges, and strengthen them when they fail.

    It's a very unscientific process, but it's a worthwhile tradeoff once the workflow starts to hit its stride. Opus 4.8 is very good at following instructions, so don't be afraid to add them in.

    Just be careful not to add things that actively encumber the workflow... It's an art, not a science. (You can also tell the clanker to tell you when your workflow rules are making things worse.)

    It's annoyingly cybernetic, but these concepts have worked well for me. The curation of good process is essential to success with these damn things.

I thought most products had legal provisions that prohibit reverse engineering?

  • Yes, and most have the same legal power as the statement: By reading this comment you accept my terms and conditions and agree to pay me ten thousand dollars per word read.

  • Those provisions would broadly be civil (not criminal); the vendor would have to identify you had reversed the blob and then take you to court, and then win.

    They could also try for criminal charges if you’re in a relevant jurisdiction.