← Back to context

Comment by fooker

5 hours ago

I have worked on some of the most supposedly reliable codebases on earth (compilers) for several decades, and most of the code in compilers is pretty bad.

And most of the code the compiler is expected to compile, seen from the perspective of fixing bugs and issues with compilers, is absolutely terrible. And the day that can be rewritten or improved reliably with AI can't come fast enough.

I honestly do not see how training AI on 'mountains of garbage' would have any other outcome than more garbage.

I've seen lots of different codebases from the inside, some good some bad. As a rule smaller + small team = better and bigger + more participants = worse.

  • The way it seems to work now is to task agents to write a good test suite. AI is much better at this than it is at writing code from scratch.

    Then you just let it iterate until tests pass. If you are not happy with the design, suggest a newer design and let it rip.

    All this is expensive and wasteful now, but stuff becoming 100-1000x cheaper has happened for every technology we have invented.

  • That's why the major AI labs are really careful about the code they include in the training runs.

    The days of indiscriminately scraping every scrap of code on the internet and pumping it all in are long gone, from what I can tell.

    • Well, if as the OP points out it is 'all garbage' they don't have a whole lot of choice to discriminate.

    • Do you have pointers to this?

      Would be a great resource to understand what works and what doesn't.