Comment by lsb

12 hours ago

The real world success they report reminds me of Simon Willison’s Red Green TDD: https://simonwillison.net/guides/agentic-engineering-pattern...

> Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.

10 comments

lsb

saberience 2 hours ago

That article is literally a definition of TDD that has been around for years and years. There's nothing novel there at all. It's literally test driven development.

jatins 9 hours ago

If Agent is writing the tests itself, does it offer better correctness guarantees than letting it write code and tests?

MillionOClock 8 hours ago
It is definitely not foolproof but IMHO, to some extent, it is easier to describe what you expect to see than to implement it so I don't find it unreasonable to think it might provide some advantages in terms of correctness.
- stingraycharles 7 hours ago
  
  That definitely depends upon the situation. More often than not, properly testing a component takes me more time than writing it.
  
  1 reply →
rvz 6 hours ago
Given the issues with AWS with Kiro and Github, We already have just a few high-profile examples of what happens when AI is used at scale and even when you let it generate tests which is something you should absolutely not do.
Otherwise in some cases, you get this issue [0].
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
- vlfig 11 minutes ago
  
  Don't "let it" generate tests. Be intentional. Define them in a way that's slightly oblique to how the production code approaches the problem, so the seams don't match. Heck, that's why it's good to write them before even thinking about the prod side.
- louiskottmann 4 hours ago
  
  The linked article does not speak of tests, it speaks of a team that failed to properly review an LLM refactor then proceeds to blame the tooling.
  LLMs are good at writing tests in my experience.

skanga 11 hours ago

TDD == Prompt Engineering, for Agentic coding tasks.

_boffin_ 9 hours ago

Wild it’s taken people this long to realize this. Also lean tickets / tasks with all needed context to complete the task, including needed references / docs, places to look in source, acceptance criteria, other stuff.