← Back to context

Comment by bob1029

12 hours ago

I saw a major uplift in performance after I combined tools like apply_patch with check_compilation & run_unit_tests. I still call the tool "apply_patch", but it now returns additional information about the build & tests if the patch succeeds. The agent went from ~80% success rate to what seems to be deterministic (so far). I don't bother to describe the compilation and unit testing processes in my prompts anymore. All I need to do is return the results of these things after something triggers them to run as a dependency.

I feel like I'm falling out of whatever is popular these days. I've been using prepaid tokens and custom harnesses for a long time now. It just seems to work. I can ignore most of the news. Copilot & friends are currently dead to me for the problems I've expressly targeted. For some codebases it's not even in the same room of performance anymore, despite using the exact same GPT5.4 base model.