← Back to context

Comment by kenferry

14 hours ago

Hm, I wonder.

I've done some work in this sort of area before, though not literally on a malloc. Yes you very much want to be careful, but ultimately it's the tests that give you confidence. Pound the heck out of it in multithreaded contexts and test for consistency.

AI is more than happy to declare the test wrong and “fix it” if you’re not careful. And the cherry on top is that sometimes the test could be wrong or need updating due to changed behavior. So…

> ...but ultimately it's the tests that give you confidence. Pound the heck out of it in multithreaded contexts and test for consistency.

I don't think so.

Even on LLM generated code, it is still not enough and you cannot trust it. They can pass the tests and still cause a regression and the code will look seemingly correct, for example in this case study [0].

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...