Comment by atonse

12 days ago

It sounds to me like you may not have used a lot of these tools yet, because your response sounds like pushback around theoreticals.

Please try the tools (especially either Claude Code with Opus 4.5, or OpenAI Codex 5.2). Not at all saying they're perfect, but they are much better than you currently think they might be (judging by your statements).

AI code reviews are already quite good, and are only going to get better.

Why is the go-to always "you must not have used it" in lieu of the much more likely experience of having already seen and rejected first-hand the slop that it churns out? Synthetic benchmarks can rise all they want; Opus 4.5 is still completely useless at all but the most trivial F# code and, in more mainstream affairs, continues to choke even on basic ASP.NET Core configuration.

  • About a year ago they sucked at writing elixir code.

    Now I use them to write nearly 100% of my elixir code.

    My point isn’t a static “you haven’t tried them”. My point is, “try them every 2-3 months and watch the improvements, otherwise your info is outdated”

> It sounds to me like you may not have used a lot of these tools yet

And this is more and more becoming the default answer I get whenever I point out obvious flaws of LLM coding tools.

Did it occur to you that I know these flaws precisely because I work a lot with, and evaluate the performance of, LLM based coding tools? Also, we're almost 4y into the alleged "AI Boom" now. It's pretty safe to assume that almost everyone in a development capacity has spent at least some effort evaluating how these tools do. At this point, stating "you're using it wrong" is like assuming that people in 2010 didn't know which way to hold a smartphone.

Sorry no sorry, but when every criticism towards a tool elecits the response that people are not using it well, then maybe, just maybe, the flaw is not with all those people, but with the tool itself.

  • Spending 4 years evaluating something that’s changing every month means almost nothing, sorry.

    Almost every post exalting these models’ capabilities talks about how good they’ve gotten since November 2025. That’s barely 90 days ago.

    So it’s not about “you’re doing it wrong”. It’s about “if you last tried it more than 3 months ago, your information is already outdated”

    • > Spending 4 years evaluating something that’s changing every month means almost nothing, sorry.

      No need to be sorry. Because, if we accept that premise, you just countered your own argument.

      If me evaluating these things for the past 4 years "means almost nothing" because they are changing sooo rapidly...then by the same logic, any experience with them also "means almost nothing". If the timeframe to get any experience with these models befor said experience becomes irelevant is as short as 90 days, then there is barely any difference between someone with experience and someone just starting out.

      Meaning, under that premise, as long as I know how to code, I can evaluate these models, no matter how little I use them.

      Luckily for me though, that's not the case anyway because...

      > It’s about “if you last tried it more than 3 months ago,

      ...guessss what: I try these almost every week. It's part of my job to do so.