← Back to context

Comment by hintymad

6 days ago

How do people ensure that AI don't produce subtle and stupid mistakes that humans usually don't make, like the one in Amazon that deleted the entire production deployment?

When a person writes code, the person reasons out the code multiple times, step by step, so that they don't make at least stupid or obvious mistakes. This level of close examination is not covered in code review. And arguably this is why we can trust more on human-written code than AI-produced, even though AI can probably write better code at smaller scale.

In contrast, Amazon asked senior engineers to review AI-generated code before merging them. But the purpose of code review was never about capturing all the bugs -- that is the job of test cases, right? Besides, the more senior an engineer is in Amazon, the more meetings they go to, and the less context they have about code. How can they be effective in code review?

> How do people ensure that AI don't produce subtle and stupid mistakes that humans usually don't make, like the one in Amazon that deleted the entire production deployment?

Right... Because Humans have never ever accidentally rm -rf'd a production system in the wrong spot?

GitLab has entered the chat.

Or that time an S3 developer purged more than intended, causing internet outages.

All the above are from 2017.

Then there was a deployment goof by Knight Capital Group, they lost 440 million in 45 minutes. The company went poof back in 2012 as a result.

MySpace back in 2019 deleted a ton of prod data from users during a botched migration (oops).

Humans make mistakes. Anyone arrogant enough to think they only write perfect code is delusional.

We are holding AI to higher standards than we hold humans, who are just as fallible if not worse.

QA your software, stop letting developers test changes.

  • I was not saying that humans are always right. I was assuming that humans probabilistically would make fewer obvious mistakes, which of course could be wrong and hence my questions. Speaking of QA, we would require AI to generate test cases, right? If so, do we rely on human review to ensure the quality of the test cases?

    • Correct, and this is why I built out GuardRails (inspired by Beads) because AI just keeps going without checking things. I force my agents to run tests, test if the project builds, and even look for human validation that new features work, as well as ensure tests pass.