Comment by andrepd
9 days ago
I don't understand this. How does it slow your development if the tests being green is a necessary condition for the code being correct? Yes it slows it compared to just writing incorrect code lol, but that's not the point.
"Brittle" here means either:
1) your test is specific to the implementation at the time of writing, not the business logic you mean to enforce.
2) your test has non-deterministic behavior (more common in end-to-end tests) that cause it to fail some small percentage of the time on repeated runs.
At the extreme, these types of tests degenerate your suite into a "change detector," where any modification to the code-base is guaranteed to make one or more tests fail.
They slow you down because every code change also requires an equal or larger investment debugging the test suite, even if nothing actually "broke" from a functional perspective.
Using LLMs to litter your code-base with low-quality tests will not end well.
The problem is that sometimes it is not a necessary condition. Rather, the tests might have been checking implementation details or just been wrong in the first place. Now, when tests fails I have extra work to figure out if its a real break or just a bad test.
The goal of tests is not to prevent you from changing the behavior of your application. The goal is to preserve important behaviors.
If you can't tell if a test is there to preserve existing happenstance behavior, or if it's there to preserve an important behavior, you're slowed way down. Every red test when you add a new feature is a blocker. If the tests are red because you broke something important, great. You saved weeks! If the tests are red because the test was testing something that doesn't matter, not so great. Your afternoon was wasted on a distraction. You can't know in advance whether something is a distraction, so this type of test is a real productivity landmine.
Here's a concrete, if contrived, example. You have a test that starts your app up in a local webserver, and requests /foo, expecting to get the contents of /foo/index.html. One day, you upgrade your web framework, and it has decided to return a 302 Moved redirect to /foo/index.html, so that URLs are always canonical now. Your test fails with "incorrect status code; got 302, want 200". So now what? Do you not apply the version upgrade? Do you rewrite the test to check for a 302 instead of a 200? Do you adjust the test HTTP client to follow redirects silently? The problem here is that you checked for something you didn't care about, the HTTP status, instead of only checking for what you cared about, that "GET /foo" gets you some text you're looking for. In a world where you let the HTTP client follow redirects, like human-piloted HTTP clients, and only checked for what you cared about, you wouldn't have had to debug this to apply the web framework security update. But since you tightened down the screws constraining your application as tightly as possible, you're here debugging this instead of doing something fun.
(The fun doubles when you have to run every test for every commit before merging, and this one failure happened 45 minutes in. Goodbye, the rest of your day!)
This example smells a lot like "overfit" in AI training as well.
It's that hard to write specs that truly match the business, hence why test-driven-development or specification-first failed to take off as a movement.
Asking specs to truly match the business before we begin using them as tests would handcuff test people in the same way we're saying that tests have the potential to handcuff app and business logic people — as opposed to empowering them. So I wouldn't blame people for writing specs that only match the code implementation at that time. It's hard to engage in prophecy.
The problem with TDD is that people assumed it was writing a specification, or directly tried to map it directly to post-hoc testing and metrics.
TDD at its core is defining expected inputs and mapping those to expected outputs at the unit of work level, e.g. function, class etc.
While UAT and domain informed what those inputs=outputs are, avoiding trying to write a broader spec that that is what many people struggle with when learning TDD.
Avoiding writing behavior or acceptance tests, and focusing on the unit of implementation tests is the whole point.
But it is challenging for many to get that to click. It should help you find ambiguous requirements, not develop a spec.
I literally do the diametric opposite of you and it works extremely well.
Im weirded out by your comment. Writing tests that couple to low level implementation details was something I thought most people did accidentally before giving up on TDD, not intentionally.
4 replies →
> So I wouldn't blame people for writing specs that only match the code implementation at that time.
WFT are you doing writing specs based on implementation? If you already have the implementation, what are you using the specs for? Or, if you want to apply this direct to tests, if you are already assuming the program is correct, what are you trying to test?
Are you talking about rewriting applications?
Where do you work if you don’t need to reverse engineer an existing implementation? Have you written everything yourself?
1 reply →