← Back to context

Comment by iepathos

6 days ago

The default output from AI is much like the default output from experienced devs prioritizing speed over architecture to meet business objectives. Just like experienced devs, LLMs accept technical debt as leverage for velocity. This isn't surprising - most code in the world carries technical debt, so that's what the models trained on and learned to optimize for.

Technical debt, like financial debt, is a tool. The problem isn't its existence, it's unmanaged accumulation.

A few observations from my experience:

1. One-shotting - if you're prompting once and shipping, you're getting the "fast and working" version, not the "well-architected" version. Same as asking an experienced dev for a quick prototype.

2. AI can output excellent code - but it takes iteration, explicit architectural constraints, and often specialized tooling. The models have seen clean code too; they just need steering toward it.

3. The solution isn't debt-free commits. The solution is measuring, prioritizing, and reducing only the highest risk tech debt - the equivalent of focusing on bottlenecks with performance profiling. Which code is high-risk? Where's the debt concentrated? Poorly-factored code with good test coverage is low-risk. Poorly-tested code in critical execution paths is high-risk. Your CI pipeline needs to check the debt automatically for you just like it needs to lint and check your tests pass.

I built https://github.com/iepathos/debtmap to solve this systematically for my projects. It measures technical debt density to prioritize risk, but more importantly for this discussion: it identifies the right context for an LLM to understand a problem without looking through the whole codebase. The output is designed to be used with an LLM for automated technical debt reduction. And because we're measuring debt before and after, we have a feedback loop - enabling the LLM to iterate effectively and see whether its refactoring had a positive impact or made things worse. That's the missing piece in most agentic workflows: measurement that closes the loop.

To your specific concern about shipping unreviewed code: I agree it's risky, but the review focus should shift from "is every line perfect" to "where are the structural risks, and are those paths well-tested?" If your code has low complexity everywhere, is well tested (always review tests), and passing everything, then ask yourself what you actually gain at that point from further investing your time over-engineering the lesser tech debt away? You can't eliminate all tech debt, but you can keep it from compounding in the places that matter.