Comment by sd9

17 hours ago

I've recently got into red/greed TDD with claude code, and I have to agree that it seems like the right way to go.

As my projects were growing in complexity and scope, I found myself worrying that we were building things that would subtly break other parts of the application. Because of the limited context windows, it was clear that after a certain size, Claude kind of stops understanding how the work you're doing interacts with the rest of the system. Tests help protect against that.

Red/green TDD specifically ensures that the current work is quite focused on the thing that you're actually trying to accomplish, in that you can observe a concrete change in behaviour as a result of the change, with the added benefit of growing the test suite over time.

It's also easier than ever to create comprehensive integration test suites - my most valuable tests are tests that test entire user facing workflows with only UI elements, using a real backend.

3 comments

sd9

vessenes 17 hours ago

Red/green is especially good with claude because even now with opus 4.6, claude can throw out a little comment like “//Implementation on hold until X/Y/Z: return { true }” and proceed to completely skip implementation based on the inline skip comment for a longgg time. It used to do this aggressively even in the tests, but by and large red/green prompting helps immensely - it tells the agent “think of failing tests as SUCCESS right now” - then you’ll get lots of them.

I’ve always been partial to integration tests too. Hand coding made integration tests feel bad; you’re almost doubling the code output in some cases - especially if you end up needing to mock a bunch of servers. Nowadays that’s cheap, which is super helpful.

sd9 17 hours ago

Yeah, I've always _preferred_ integration tests, but the cost of building them was so great. Now the cost is effectively eliminated, and if you make a change that genuinely does affect an integration test (changing the text on a button, for example) it's easy to smart-find-and-replace and fix them up. So I'm using them a lot more.
The only problem is... they still take much longer to _run_ than unit tests, and they do tend to be more flaky (although Claude is helpful in fixing flaky tests too). I'm grateful for the extra safety, but it makes deployments that much slower. I've not really found a solution to that part beyond parallelising.
jghn 13 hours ago

Granted it doesn't always pay attention to Claude.md but one thing I've done is in my block of rules it must always follow is to never leave something unimplemented w/ placeholders unless explicitly told to do so. It's made this mostly go away for me.