Comment by datadeft

5 months ago

I am not sure how good these Exercism tasks are for measuring how good at a model with coding.

My experience is that these models could write a simple function and get it right if it does not require any out of the box thinking (so essentially offloading the boilerplate part of coding). When it comes to think creatively and have a much better solution to a specific task that would require the think 2-3 steps ahead than they are not suitable.

I think many of the "AI can do coding" narratives don't see what coding means in real situations.

It's finding out why "jbdoe1337" added this large if/else around the entire function body back in 2016 - it seems important business logic, but the commit just says "updated code". And how the h*ll this interaction between the conf.ini files, the conf/something.json and the ENV vars works. Why sometimes the ENV var overrides a value in the ini and why its sometimes the other way around. But also finding that when you clean it up, everything falls apart.

It's discussing with the stakeholders why "adding a delete button" isn't as easy as just putting a button there, but that it means designing a whole cascading deletion strategy and/or trashcan and/or soft-delete and/or garbage-collection.

It's finding out why - again - the grumb pipeline crashes with the typebar checker, when used through mpm-yearn package manager. Both in containers and on a osx machine but not on Linux Brobuntu 22.12 LTLS.

It's moving stuff in the right abstraction layer. It's removing abstractions while introducing others. KISS vs future flexibility. It's gut feeling when to apply DRY and when to embrace it.

And then, if your lucky, churning out boilerplate or new code for 120 minutes a week.

I'm glad that this 120 minutes can be improved with AI and become 20 minutes. Truly. But this is not what (senior?) programmers do. Despite what the hyped up AI press makes us believe. It only shows they have no idea what the "real" problems and time-consumers are for programmers.

  • Systems built from scratch with AI won't have these limitations, because only the model will ever see the code. It will implement a spec that's written in English or another human language.

    When the business requirements change, the spec will change. When that happens, the system will either modify its previously-written code or regenerate it from the ground up. Which strategy it chooses won't be especially interesting or important.

    The process of maintaining the English-language spec will still require great care and precision. It will be called "programming," or perhaps "coding."

    A few graybearded gurus will insist on examining the underlying C or Javascript or Python or Rust or whatever the model generates, the way they peer at compiler-generated assembly code now. Occasionally this capability will be important, even vital. But not usually. The situations where it's necessary will become less common over time.

    • > It will implement a spec that's written in English or another human language.

      No, it won't. Because "human languages" lack the precision to describe such a spec. This is exactly why programming languages exist in the first place: a language that humans understand but that allow for precise and unambiguous specifications and/or instructions. Do note that a computer cannot execute "Python" or "C". We needs to translate it first (compiling). Edit: A programmer doens't just type curly brackets and semi-colons in the right place, she takes vague and ambigous specs and makes them precise enough so that machines can repeat them.

      As a kid we had this joke (works better in Dutch).

      John gets in an accident, looses both his arms. A doctor gives him futuristic voice-controlled prostethics.

      John: "Pick up coffee-mug". "Bring to mouth to drink". woa! impressed he goes home.

      John, all excited "unzip pants", "grab d#ck", "jerk off"

      (in Dutch, trek af means both "rip off" and "w#ank")

      Jokes aside, we do have such a language that's not a programming language in the common sense: executable specs - end to end tests. Gherkin being a famous one but certainly not the only one. BDD, where the B is described by humans, in a DSL and the DD is performed by AI. I could imagine this working. Not currently and not anywhere soon (current LLMs are great at making new stuff, horrible at changing existing stuff), but it might work.

      We'd then end up with just another programming language, but one thats more accessible to more people, I guess. And the AI is "just a compiler" in that sense.

    • I haven't seen evidence that this will come to pass. But it's possible. English-language specs are ambiguous. Do you really think businesses with money on the line will tolerate an LLM making automated changes to a codebase and pushing them without a human in the loop? Even human programmers create outages (and we have "AGI" which is the holy grail). If an autonomous LLM creates outages 10% more frequently than a team of humans it is basically unusable. We would need to see a lot more improvement from current state of the art.

  • Exactly. People sold on AI replacing software engineers are missing the point. It is almost the say that better laptops are replacing software engineers. LLMs are just tools that make you faster. Finding bugs, writing documentation, etc. are very nice to accelerate but creative thinking is also a big part of the job.