← Back to context

Comment by zem

1 day ago

i just ran into a concrete example of why i would not want to run a tree of unsupervised agents churning out code. i have a project that generates large but repetitive .docx documents. i asked claude to add some graphics to it, it did a very good job of figuring out the xml graphics elements, locating where in the document structure it could insert them, and even printing to pdf and checking visually to get them perfectly lined up with the text. it took some 5 minutes, i would likely have spent an hour doing all that from scratch including several trips to google.

then i looked at the code and asked it to benchmark, hinting that it looked like it was doing a lot in the inner loop. and sure enough, adding a few simple graphics to every page more doubled the time it took to generate the largest size of document (~1s -> ~2.2s for ~400 pages). without any more prompting claude figured out that it had an accidentally-quadratic loop, and fixed that.

i then had to tell it "look, we are using a template to avoid regenerating boilerplate with every page. you can add a placeholder to the template and replace it with graphics using xml patching code you already wrote for another part of the doc generation". the final code was a lot cleaner and ran in ~1.2s, which claude (again unprompted, to its credit) did fine-grained benchmarking to prove was the unavoidable overhead of simply inserting all those large chunks of xml into the document.

i wouldn't even say it was a coincidence that i ran into this right after writing my comment about having to micromanage the LLM, because this sort of thing happens all the time. i can say that i had a much easier time doing this because i looked at the code generated in a single commit and could easily see that it smelt off. i would have not have wanted to do this at the end of 20 commits all building on each other.