← Back to context

Comment by threethirtytwo

15 days ago

A human doing the same tasks as what the LLM did in the paper that the human will degrade the document further then the LLM. If the LLM is 25%, a human would degrade it probably 80% if they used the same technique as the LLM did in this paper. I'm talking about a single pass.

The fact of the matter is, humans don't edit things the way it was done in the paper and neither do coding agents like claude. Think about it: You do not ingest an entire paper and then regurgitate that paper with a single targeted edit... and neither do coding agents.

Also think carefully. A 25% degradation rate is unacceptable in the industry. The AI change that's taking over all of SWE development would not actually exist if there was 25% degradation... that's way too much.

Are we comparing humans to LLMs or human written software to LLMs?

The whole point of creating software to do things used to be getting things done more accurately and consistently.

  • No. The whole point of creating software is getting things done.

    "More accurately and consistently" was merely downstream from what capabilities were natural for machine logic and hard algorithms.

    Now, we're just spoiled for choice. We have hard algorithm software where we want to do things that benefit for accurate, consistent, highly deterministic behavior - and we have soft algorithm AI for when we want to do things that simply aren't amenable to hard logic.

    Machine translation used to be a horrid mess when we were trying to do it with symbolic systems. Because symbolic systems are "consistent, highly deterministic" but not at all "accurate" on translation tasks. Being able to leverage LLMs for that is a generational leap.

    • All of software is hard-coded algorithm.

      If you differ between AI source code and engineer source code say so. "Getting things done" is a business need. Which things get translated to a deterministic language executable by a computer is code.

      There are entire languages dedicated for lesser engineers/domain experts to formulate business requirements.

      Anyhow; What's your point? That we received a framework for "soft algorithms" where the output does not need to be correct and deducible? What's even the point of putting it into software. Just forward your input to the reader and let him judge on its own.

      5 replies →

Except that coding agents will do this at times. That's half the problem. A human will forget details and exaggerate others, but LLMs fail in spectacular ways that humans rarely would, like trying to copy a document from memory rather than one word at a time, side by side, or rewriting the whole thing just to make some simple changes. Coding agents will delete tests or return True to get them to pass - something you would never expect of even a junior professional.

And I know this because I see it all the time. I use composer-2 and sonnet 4.6 on a regular basis. It's not much better for my colleagues who use Opus or GPT or any of the other frontier models. Most of the time it's fine, but other times it does things simply unforgivable for a human. I have to watch the agent closely so that it doesn't decide to nuke my database; I don't have to do that with any of my juniors, even those with little experience and poor discipline.

  • > nuke

    > I don’t have to do that with any of my juniors…

    For some values of “nuke,” I absolutely have had to do that with juniors in the past. Perhaps you’re referring to a single rm -r or hilarious force push or something, but undertrained and unsupervised juniors regularly introduce things like SQL injection, XSS, etc. simply because they don’t know any better yet. This isn’t saying “AI is better across the board” - I just don’t think they’re comparable, also think AI shouldn’t be used to chop the bottom 5 rungs off our career ladder. But let’s not pretend juniors can be left alone with a codebase without any worries.