Comment by bob1029

2 months ago

I saw a major uplift in performance after I combined tools like apply_patch with check_compilation & run_unit_tests. I still call the tool "apply_patch", but it now returns additional information about the build & tests if the patch succeeds. The agent went from ~80% success rate to what seems to be deterministic (so far). I don't bother to describe the compilation and unit testing processes in my prompts anymore. All I need to do is return the results of these things after something triggers them to run as a dependency.

I feel like I'm falling out of whatever is popular these days. I've been using prepaid tokens and custom harnesses for a long time now. It just seems to work. I can ignore most of the news. Copilot & friends are currently dead to me for the problems I've expressly targeted. For some codebases it's not even in the same room of performance anymore, despite using the exact same GPT5.4 base model.

3 comments

bob1029

modo_ 2 months ago

I like this - I think you're not too far off of what's popular these days though. I think similar functionality can be achieved by using the "hook" functionality in claude code / codex.

bostonvaulter2 2 months ago

Can you explain in more detail how you implemented those tools? Is that via a MCP server?

bob1029 2 months ago

> Is that via a MCP server?
No, this all in one application. A Winforms+WebView2 app wraps the chat completion APIs and implements the various tools directly.