← Back to context

Comment by judahmeek

3 hours ago

Something is missing in the common test suite if this can occur, right?

First, it's not "can occur" but does occur 100% of the time. Second, sure, it does mean something is missing, but how do you test for "this codebase can withstand at least two years of evolution"?

You can spend a lot of time perfecting the test suite to meet your specific requirements and needs, but I think that would take quite a while, and at that point, why not just write the code yourself? I think the most viable approach of today's AI is still to let it code and steer it when it makes a decision you don't like, as it goes along.

You have to fight to get agents to write tests in my experience. It can be done, but they don't. I've yet to figure out how get any any agent to use TDD - that is write a test and then verify it fails - once in a while I can get it to write one test that way, but it then writes far more code to make it pass than the test justifies and so is still missing coverage of important edge cases.

  • I have TDD flow working as a part of my tasks structuring and then task completion. There are separate tasks for making the tests and for implementing. The agent which implements is told to pick up only the first available task, which will be “write tests task”, it reliably does so. I just needed to add how it should mark tests as skipped because it’s been conflicting with quality gates.