Comment by stouset

4 days ago

I'm the opposite. I held out this view for a long, long time. About two months ago, I gave Zed's agentic sidebar a try.

I'm blown away.

I'm a very senior engineer. I have extremely high standards. I know a lot of technologies top to bottom. And I have immediately found it insanely helpful.

There are a few hugely valuable use-cases for me. The first is writing tests. Agentic AI right now is shockingly good at figuring out what your code should be doing and writing tests that test the behavior, all the verbose and annoying edge cases, and even find bugs in your implementation. It's goddamn near magic. That's not to say they're perfect, sometimes they do get confused and assume your implementation is correct when the test doesn't pass. Sometimes they do misunderstand. But the overall improvement for me has been enormous. They also generally write good tests. Refactoring never breaks the tests they've written unless an actually-visible behavior change has happened.

Second is trying to figure out the answer to really thorny problems. I'm extremely good at doing this, but agentic AI has made me faster. It can prototype approaches that I want to try faster than I can and we can see if the approach works extremely quickly. I might not use the code it wrote, but the ability to rapidly give four or five alternatives a go versus the one or two I would personally have time for is massively helpful. I've even had them find approaches I never would have considered that ended up being my clear favorite. They're not always better than me at choosing which one to go with (I often ask for their summarized recommendations), but the sheer speed in which they get them done is a godsend.

Finding the source of tricky bugs is one more case that they excel in. I can do this work too, but again, they're faster. They'll write multiple tests with debugging output that leads to the answer in barely more time than it takes to just run the tests. A bug that might take me an hour to track down can take them five minutes. Even for a really hard one, I can set them on the task while I go make coffee or take the dog for a walk. They'll figure it out while I'm gone.

Lastly, when I have some spare time, I love asking them what areas of a code base could use some love and what are the biggest reward-to-effort ratio wins. They are great at finding those places and helping me constantly make things just a little bit better, one place at a time.

Overall, it's like having an extremely eager and prolific junior assistant with an encyclopedic brain. You have to give them guidance, you have to take some of their work with a grain of salt, but used correctly they're insanely productive. And as a bonus, unlike a real human, you don't ever have to feel guilty about throwing away their work if it doesn't make the grade.

> Agentic AI right now is shockingly good at figuring out what your code should be doing and writing tests that test the behavior, all the verbose and annoying edge cases,

That's a red flag for me. Having a lot of tests usually means that your domain is fully known so now you can specify it fully with tests. But in a lot of setting, the domain is a bunch of business rules that product decides on the fly. So you need to be pragmatic and only write tests against valuable workflows. Or find yourself changing a line and have 100+ tests breaking.

  • If you can write tests fast enough, you can specify those business rules on the fly. The ideal case is that tests always reflect current business rules. Usually that may be infeasible because of the speed at which those rules change, but I’ve had a similar experience of AI just getting tests right, and even better, getting tests verifiably right because the tests are so easy to read through myself. That makes it way easier to change tests rapidly.

    This also is ignoring that ideally business logic is implemented as a combination of smaller, stabler components that can be independently unit tested.

    • Unit tests value is mostly when integration and more general tests are failing. So you can filter out some sections in the culprit list (you don’t want to spend days specifying the headlights if the electric design is wrong or the car can’t start)

      Having a lot of tests is great until you need to refactor them. I would rather have a few e2e for smoke testing and valuable workflows, Integration tests for business rules. And unit tests when it actually matters. As long as I can change implementation details without touching the tests that much.

      Code is a liability. Unless you don’t have to deal with (assembly and compilers) reducing the amount of code is a good strategy.

  • This is a red flag for me. Any given user-facing software project with changing requirements is still built on top of relatively stable, consistent lower layers. You might change the business rules on top of those layers, but you need generally reasonable and stable internal APIs.

    Not having this is very indicative of a spaghetti soup architecture. Hard pass.

  • Tests breaking when you change things is... kind of the point?

    • You can over specify. When the rules are stringent it's best to have extensive test suites (Like Formula 1). But when it's just a general app, you need to be pragmatic. It's like having a too sensitive sensor in some systems.

What languages and contexts are you getting these good results for?