LLMs as the new high level language

5 days ago (federicopereiro.com)

Are these kinds of articles a new breed of rage bait? They keep ending up on the front page with thriving comment sections, but in terms of content they're pretty low in nutritional value.

So I'm guessing they just rise because they spark a debate?

One of the reasons we have programming languages is they allow us to express fluently the specificity required to instruct a machine.

For very large projects, are we sure that English (or other natural languages) are actually a better/faster/cheaper way to express what we want to build? Even if we could guarantee fully-deterministic "compilation", would the specificity required not balloon the (e.g.) English out to well beyond what (e.g.) Java might need?

Writing code will become writing books? Still thinking through this, but I can't help but feel natural languages are still poorly suited and slower, especially for novel creations that don't have a well-understood (or "linguistically-abstracted") prior.

Isn't this a little bit of a category error? LLMs are not a language. But prompts to LLMs are written in a language, more or less a natural language such as English. Unfortunately, natural languages are not very precise and full of ambiguity. I suspect that different models would interpret wordings and phrases slightly differently, leading to behaviors in the resulting code that are difficult to predict.

The article starts with a philosophically bad analogy in my opinion. C-> Java != Java -> LLM because the intermediate product (the code) changed its form with previous transitions. LLMs still produce the same intermediate product. I expanded on this in a post a couple months back:

https://www.observationalhazard.com/2025/12/c-java-java-llm....

"The intermediate product is the source code itself. The intermediate goal of a software development project is to produce robust maintainable source code. The end product is to produce a binary. New programming languages changed the intermediate product. When a team changed from using assembly, to C, to Java, it drastically changed its intermediate product. That came with new tools built around different language ecosystems and different programming paradigms and philosophies. Which in turn came with new ways of refactoring, thinking about software architecture, and working together.

LLMs don’t do that in the same way. The intermediate product of LLMs is still the Java or C or Rust or Python that came before them. English is not the intermediate product, as much as some may say it is. You don’t go prompt->binary. You still go prompt->source code->changes to source code from hand editing or further prompts->binary. It’s a distinction that matters.

Until LLMs are fully autonomous with virtually no human guidance or oversight, source code in existing languages will continue to be the intermediate product. And that means many of the ways that we work together will continue to be the same (how we architect source code, store and review it, collaborate on it, refactor it, etc.) in a way that it wasn’t with prior transitions. These processes are just supercharged and easier because the LLM is supporting us or doing much of the work for us."

  • What would you say if someone has a project written in, let's say, PureScript and then they use a Java backend to generate/overwrite and also version control Java code. If they claim that this would be a Java project, you would probably disagree right? Seems to me that LLMs are the same thing, that is, if you also store the prompt and everything else to reproduce the same code generation process. Since LLMs can be made deterministic, I don't see why that wouldn't be possible.

    • A determinisitic prompt + seed used to generate an output is interesting as a way to deterministically record entirely how code came about, but it's also not a thing people are actually doing. Right now, everyone is slinging around LLM outputs without any trying to be reproducible; no seed, nothing. What you've described and what the article describe are very different.

    • PureScript is a programming language. English is not. A better analogy would be what would you say about someone who uses a No Code solution that behind the scenes writes Java. I would say that's a much better analogy. NoCode -> Java is similar to LLM -> Java.

      I'm not debating whether LLMs are amazing tools or whether they change programming. Clearly both are true. I'm debating whether people are using accurate analogies.

      4 replies →

IDK how everyone else feel about it, but a non-deterministic “compiler” is the last thing I need.

  • A compiler that can turn cash into improved code without round tripping a human is very cool though. As those steps can get longer and succeed more often in more difficult circumstances, what it means to be a software engineer changes a lot.

    • LLMs may occasionally turn bad code into better code but letting them loose on “good” or even “good enough” code is not always likely to make it “better”.

  • I may have bad news for you on how compilers typically work.

    • The difference is that what most languages compile to is much much more stable than what is produced by running a spec through an LLM.

      A language or a library might change the implementation of a sorting algorithm once in a few years. An LLM is likely to do it every time you regenerate the code.

      It’s not just a matter of non-determinism either, but about how chaotic LLMs are. Compilers can produce different machine code with slightly different inputs, but it’s nothing compared to how wildly different LLM output is with very small differences in input. Adding a single word to your spec file can cause the final code to be far more unrecognizably different than adding a new line to a C file.

      If you are only checking in the spec which is the logical conclusion of “this is the new high level language”, everyone you regenerate your code all of the thousands upon thousands of unspecified implementation details will change.

      Oops I didn’t think I needed to specify what going to happen when a user tries to do C before A but after B. Yesterday it didn’t seem to do anything but today it resets their account balance to $0. But after the deployment 5 minutes ago it seems to be fixed.

      Sometimes users dragging a box across the screen will see the box disappear behind other boxes. I can’t reproduce it though.

      I changed one word in my spec and now there’s an extra 500k LOC to implement a hidden asteroids game on the home page that uses 100% of every visitor’s CPU.

      This kind of stuff happens now, but the scale with which it will happen if you actually use LLMs as a high level language is unimaginable. The chaos of all the little unspecified implementation details constantly shifting is just insane to contemplate as user or a maintainer.

    • Deterministic compilation, aka reproducible builds, has been a basic software engineering concept and goal for 40+ years. Perhaps you could provide some examples of compilers that produce non-deterministic output along with your bad news.

    • If you are referring to timestamps, buildids, comptime environments, hardwired heuristics for optimization, or even bugs in compilers -- those are not the same kind of non-determinism as in LLMs. The former ones can be mitigated by long-standing practices of reproducible builds, while the latter is intrinsic to LLMs if they are meant to be more useful than a voice recorder.

    • Compilers aim to be fully deterministic. The biggest source of nondeterminism when building software isn't the compiler itself, but build systems invoking the compiler nondeterministically (because iterating the files in a directory isn't necessarily deterministic across different machines).

    • You'll need to share with the class because compilers are pretty damn deterministic.

    • Compilers are about 10 orders of magnitude more deterministic than LLMs, if not more.

I would like to hijack the "high level language" term to mean dopamine hits from using an LLM.

"Generate a Frontend End for me now please so I don't need to think"

LLM starts outputting tokens

Dopamine hit to the brain as I get my reward without having to run npm and figure out what packages to use

Then out of a shadowy alleyway a man in a trenchcoat approaches

"Pssssttt, all the suckers are using that tool, come try some Opus 4.6"

"How much?"

"Oh that'll be $200.... and your muscle memory for running maven commands"

"Shut up and take my money"

----- 5 months later, washed up and disconnected from cloud LLMs ------

"Anyone got any spare tokens I could use?"

  • > and your muscle memory for running maven commands

    Here's $1000. Please do that. Don't bother with the LLM.

  • If you're disconnected from cloud LLM's you've got bigger problems than coding can solve lol

I have a source file of a few hundred lines implementing an algorithm that no LLM I've tried (and I've tried them all) is able to replicate, or even suggest, when prompted with the problem. Even with many follow up prompts and hints.

The implementations that come out are buggy or just plain broken

The problem is a relatively simple one, and the algorithm uses a few clever tricks. The implementation is subtle...but nonetheless it exists in both open and closed source projects.

LLMs can replace a lot of CRUD apps and skeleton code, tooling, scripting, infra setup etc, but when it comes to the hard stuff they still suck.

Give me a whiteboard and a fellow engineer anyday

  • Well I think that’s kind of the point or value in these tools. Let the AI do the tedious stuff saving your energy for the hard stuff. At least that’s how I use them, just save me from all the typing and tedium. I’d rather describe something like auth0 integration to an LLM than do it all myself. Same goes for like the typical list of records, clock one, view the details and then a list of related records and all the operations that go with that. Like it’s so boring let the LLM do that stuff for you.

  • I'm seeing the same thing with my own little app that implements several new heuristics for functionality and optimisation over a classic algorithm in this domain. I came up with the improvements by implementing the older algorithm and just... being a human and spending time with the problem.

    The improvements become evident from the nature of the problem in the physical world. I can see why a purely text-based intelligence could not have derived them from the specs, and I haven't been able to coax them out of LLMs with any amount of prodding and persuasion. They reason about the problem in some abstract space detached from reality; they're brilliant savants in that sense, but you can't teach a blind person what the colour red feels like to see.

  • This is one of my favourite activites with LLMs as well. After implementing some sort of idea for an algorithm, I try seeing what an LLM would come up with. I hint it as well and push it in the correct direction with many iterations but never tell the most ideal one. And as a matter of fact they can never reach the quality I did with my initial implementation.

Can we stop repeating this canard, over and over?

Every "classic computing" language mentioned, and pretty much in history, is highly deterministic, and mind-bogglingly, huge-number-of-9s reliable (when was the last time your CPU did the wrong thing on one of the billions of machine instructions it executes every second, or your compiler gave two different outputs from the same code?)

LLMs are not even "one 9" reliable at the moment. Indeed, each token is a freaking RNG draw off a probability distribution. "Compiling" is a crap shoot, a slot machine pull. By design. And the errors compound/multiply over repeated pulls as others have shown.

I'll take the gloriously reliable classical compute world to compile my stuff any day.

  • Agreed, yet we will have to keep seeing this take over and over again. As if I needed more reasons to believe the world is filled with morons.

This is an exaggeration, if you store the prompt that was "compiled" by today's LLMs there is no guarantee that in 4 months from now you will be able to replicate the same result.

I can take some C or Fortran code from 10 years ago, build it and get identical results.

After working with the latest models I think these "it's just another tool" or "another layer of abstraction" or "I'm just building at a different level" kind of arguments are wishful thinking. You're not going to be a designer writing blueprints for a series of workers to execute on, you're barely going to be a product manager translating business requirements into a technical specification before AI closes that gap as well. I'm very convinced non-technical people will be able to use these tools, because what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.

The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact and the people that just wanted to make money from software can now finally stop pretending that "passion" for "the craft" was every really part of their motivating calculus.

  • If all you (not you specifically, more of a royal “you” or “we”) are is a collection of skills centered around putting code into an editor and opening pull requests as fast as possible, then sure, you might be cooked.

    But if your job depends on taste, design, intuition, sociability, judgement, coaching, inspiring, explaining, or empathy in the context of using technology to solve human problems, you’ll be fine. The premium for these skills is going _way_ up.

    • The question isn't whether businesses will have 0 human element to them, the question is does AI offer a big enough gap that technical skills are still required such that technical roles are still hired for. Someone in product can have all of those skills without a computer science degree, with no design experience, and AI will do the technical work at the level of design, implementation, and maintenance. What I am seeing with the new models isn't just writing code, it's taking fundamental problems as input and design wholistic software solutions as output - and the quality is there.

      1 reply →

    • It turns out that corporations value these things right up until a cheaper almost as good alternative is available.

      The writing is on the wall for all white collar work. Not this year or next, but it's coming.

      1 reply →

  • > translating business requirements into a technical specification

    a.k.a. Being a programmer.

    > The irony is that I haven't seen AI have nearly as large of an impact anywhere else.

    What lol. Translation? Graphic design?

  • > what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.

    So when things break or they have to make changes, and the AI gets lost down a rabbit hole, who is held accountable?

    • The answer is the AI. It's already handling complex issues and debugging solely by gathering its own context, doing major refactors successfully, and doing feature design work. The people that will be held responsible will be the product owners, but it won't be for bugs, it will be for business impact.

      My point is that SWEs are living on a prayer that AI will be perched on a knifes edge where there is still be some amount of technical work to make our profession sustainable and from what I'm seeing that's not going to be the case. It won't happen overnight, but I doubt my kids will ever even think about a computer science degree or doing what I did for work.

      10 replies →

  • > The irony is that I haven't seen AI have nearly as large of an impact anywhere else.

    We are in this pickle because programmers are good at making tools that help programmers. Programming is the tip of the spear, as far as AI's impact goes, but there's more to come.

    Why pay an expensive architect to design your new office building, when AI will do it for peanuts? Why pay an expensive lawyer to review your contract? Why pay a doctor, etc.

    Short term, doing for lawyers, architects, civil engineers, doctors, etc what Claude Code has done for programmers is a winning business strategy. Long term, gaining expertise in any field of intellectual labor is setting yourself up to be replaced.

This is a good summary of any random week's worth of AI shilling from your LinkedIn feed, that you can't get rid of.

Why using agents if there are absolutely zero need for them? It's the usual, here, we spent a shitton of money on this, now find out how we MUST include this horrible thing into our already bloated dev environment.

"Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages."

This is not an appropriate analogy, at least not right now.

Code Agents are generating code from prompts, in that sense the metaphor is correct. However Agents then read the code and it becomes input and they generate more code. This was never the case for compilers, an LLM used in this sense is strictly not a compiler because it is not cyclic and not directional.

  • I think it's appropriate in terms of the results rather than the process; the bigger problem I see is that programming languages are designed to be completely unambiguous, whereas human language is not ("Go to the shop and buy one box of eggs, and if they have carrots, buy three") so we're transitioning from exactly specifying what we want the software to do, to tying ourselves in knots trying to specify it exactly, while a machine tries to disambiguate our request. I bet lawyers would make good vibe coders.

We’re missing the boat here. There are already companies with millions in revenue that are pure agent loops of English text. They can do things our traditional software cannot.

I'm trying to work with vibe-coded applications and it's a nightmare. I am trying to make one application multi-tenant by moving a bunch of code that's custom to a single customer into config. There are 200+ line methods, dead code everywhere, tons of unnecessary complexity (for instance, extra mapping layers that were introduced to resolve discrepancies between keys, instead of just using the same key everywhere). No unit tests, of course, so it's very difficult to tell if anything broke. When the system requirements change, the LLM isn't removing old code, it's just adding new branches and keeping the dead code around.

I ask the developer the simplest questions, like "which of the multiple entry-points do you use to test this code locally", or "you have a 'mode' parameter here that determines which branch of the code executes, which of these modes are actually used? and I get a bunch of babble, because he has no idea how any of it works.

Of course, since everyone is expected to use Cursor for everything and move at warp speed, I have no time to actually untangle this crap.

The LLM is amazing at some things - I can get it to one-shot adding a page to a react app for instance. But if you don't know what good code looks like, you're not going to get a maintainable result.

  • You've just described the entirely-human-made project that I'm working on now.... at least now we can deliver the intractable mess much more quickly!

The side effect of using LLMs for programming is that no new programming language can now emerge to be popular, that we will be stuck with the existing programming languages forever for broad use. Newer languages will never accumulate enough training data for the LLM to master them. Granted, non-LLM AIs with true neural memory can work around this, as can LLMs with an infinite token frozen+forkable context, but these are not your everyday LLMs.

  • I don't think we need that many programming languages anyway.

    I'm more worried about the opposite: the next popular programming paradigm will be something that's hard to read for humans but not-so-hard for LLM. For example, English -> assembly.

  • I wouldn’t be surprised if in the next 5-10 years the new and popular programming language is one built with the idea of optimizing how well LLM’s (or at that point world models) understand and can use it.

    Right now LLMs are taking languages meant for humans to understand better via abstraction, what if the next language is designed for optimal LLM/world model understanding?

    Or instead of an entirely new language, theres some form of compiling/transpiling from the model language to a human centric one like WASM for LLMs

It's not a programming language if you can't read someone else's code, figure out what it does, figure out what they meant, and debug the difference between those things.

"I prompted it like this"

"I gave it the same prompt, and it came out different"

It's not programming. It might be having a pseudo-conversation with a complex system, but it's not programming.

  • > It's not a programming language if you can't read someone else's code, figure out what it does, figure out what they meant, and debug the difference between those things.

    Well I think the article would say that you can diff the documentation, and it's the documentation that is feeding the AI in this new paradigm (which isn't direct prompting).

    If the definition of programming is "a process to create sets of instructions that tell a computer how to perform specific tasks" there is nothing in there that requires it to be deterministic at the definition level.

  • > "I gave it the same prompt, and it came out different"

    I wrote a program in C and and gave it to gcc. Then I gave the same program to clang and I got a different result.

    I guess C code isn't programming.

    • Note that the prompt wasn't fed to another LLM, but to the same one. "I wrote a program in C and gave it to GCC. Then I gave the same program to GCC again and I got a different result" would be more like it.

      6 replies →

    • If there is no error on the compiler implementation and no undefined behavior the resulting program is equivalent and the few differences are mostly just implementation defined stuff which are left to the compiler to decide (but often gcc and clang do the same). The performance might differ also. It’s clearly not comparable to the many differences you can get from LLM’s output.

      6 replies →

  • I think I 100% agree with you, and yet the other day I found myself telling someone "Did you know OpenClaw was written Codex and not Claude Code?", and I really think I meant it in the same sense I'd mean a programming language or framework, and I only noticed what I'd said a few minutes later.

  • Prompting isn't programming. Prompting is managing.

    • Is it?

      If I know the system I'm designing and I'm steering, isn't it the same?

      We're not punching cards anymore, yet we're still telling the machines what to do.

      Regardless, the only thing that matters is to create value.

    • Interesting how the definition “real programming” keeps changing. I’m pretty sure when the assembler first came, bare metal machine code programmers said “this isn’t programming”. And I can imagine their horror when the compiler came along.

      1 reply →

  • All programming achieves the same outcome; requests the OS/machine set aside some memory to hold salient values and mutate those values in-line with mathematical recipe.

    Functions like:

    updatesUsername(string) returns result

    ...can be turned into generic functional euphemism

    takeStringRtnBool(string) returns bool

    ...same thing. context can be established by the data passed in, external system interactions (updates user values, inventory of widgets)

    as workers SWEs are just obfuscating how repetitive their effort is to people who don't know better

    the era of pure data driven systems is arrived. in-line with the push to dump OOP we're dumping irrelevant context in the code altogether: https://en.wikipedia.org/wiki/Data-driven_programming

  • [flagged]

    • I'm not sure I will ever understand the presentation of "inevitable === un-criticizable" as some kind of patent truth. It's so obviously fallacious that I'm not sure what could even drive a human to write it, and yet there it is, over and over and over.

      Lots of horrifying things are inevitable because they represent "progress" (where "progress" means "good for the market", even if it's bad for the idea of civilization), and we, as a society, come to adapt to them, not because they are good, but because they are.

  • >"I prompted it like this"

    >"I gave it the same prompt, and it came out different"

    1:1 reproducibility is much easier in LLMs than in software building pipelines. It's just not guaranteed by major providers because it makes batching less efficient.

    • > 1:1 reproducibility is much easier in LLMs than in software building pipelines

      What’s a ‘software building pipeline’ in your view here? I can’t think of parts of the usual SDLC that are less reproducible than LLMs, could you elaborate?

      2 replies →

These models are nothing short of astounding.

I can write a spec for an entirely new endpoint, and Claude figures out all of the middleware plumbing and the database queries. (The catch: this is in Rust and the SQL is raw, without an ORM. It just gets it. I'm reviewing the code, too, and it's mostly excellent.)

I can ask Claude to add new data to the return payloads - it does it, and it can figure out the cache invalidation.

These models are blowing my mind. It's like I have an army of juniors I can actually trust.

  • I'm not sure I'd call agents an army of juniors. More like a high school summer intern who has infinite time to do deep dives into StackOverflow but doesn't have nearly enough programming experience yet to have developed a "taste" for good code

    In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity. They don't follow DRY principles unless you push them very hard in that direction (and even then not always), and sometimes they do things that just fly in the face of common sense. Example of that last part: I was writing some Ruby tests with Opus 4.6 yesterday, and I got dozens of tests that amounted to this:

       x = X.new
       assert x.kind_of?(X)
    

    This is of course an entirely meaningless check. But if you aren't reading the tests and you just run the test job and see hundreds of green check marks and dozens of classes covered, it could give you a false sense of security

    • > In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity

      You are missing the forest for the trees. Sure, we can find flaws in the current generation of LLMs. But they'll be fixed. We have a tool that can learn to do anything as well as a human, given sufficient input.

  • > this is in Rust and the SQL is raw, without an ORM.

    where's the catch? SQL is an old technology, surely an LLM is good with it