Comment by deepsquirrelnet
10 hours ago
At this point, I am starting to feel like we don’t need new languages, but new ways to create specifications.
I have a hypothesis that an LLM can act as a pseudocode to code translator, where the pseudocode can tolerate a mixture of code-like and natural language specification. The benefit being that it formalizes the human as the specifier (which must be done anyway) and the llm as the code writer. This also might enable lower resource “non-frontier” models to be more useful. Additionally, it allows tolerance to syntax mistakes or in the worst case, natural language if needed.
In other words, I think llms don’t need new languages, we do.
What we need is a programming language that defines the diff to be applied upon the existing codebase to the same degree of unambiguity as the codebase itself.
That is, in the same way that event sourcing materializes a state from a series of change events, this language needs to materialize a codebase from a series of "modification instructions". Different models may materialize a different codebase using the same series of instructions (like compilers), or say different "environmental factors" (e.g. the database or cloud provider that's available). It's as if the codebase itself is no longer the important artifact, the sequence of prompts is. You would also use this sequence of prompts to generate a testing suite completely independent of the codebase.
I am working on that https://github.com/gritzko/librdx Conflictless merge and overlay branches (ie freely attachable/detachable by a click). That was the pie-in-the-sky of the CRDT community for maybe 15 years. My current approach is RDX tree CRDT effectively mapping to the AST tree of the program. Like CRDT DOM for the AST, because line based diffs are too clumsy for that.
Back in the day, JetBrains tried revision-controlling AST trees or psi-nodes in their parlance. That project was cancelled, as it became a research challenge. That was 10 years ago or so. At this point, things may work out well, time will tell.
Was it cancelled? I thought MPS works that way.
I think this could be very useful even for regular old programming. We could treat the diffs to the code as the main source of truth (instead of the textual snapshot each diff creates).
Jonathan Edwards (Subtext lang) has a lot of great research on this.
[flagged]
I think this confuses two different things:
- LLMs can act as pseudocode to code translators (they are excellent at this)
- LLMs still create bugs and make errors, and a reasonable hypothesis is at a rate in direct proportion to the "complexity" or "buggedness" of the underlying language.
In other words, give an AI a footgun and it will happily use it unawares. That doesn't mean however it can't rapidly turn your pseudocode into code.
None of this means that LLMs can magically correct your pseudocode at all times if your logic is vastly wrong for your goal, but I do believe they'll benefit immensely from new languages that reduce the kind of bugs they make.
This is the moment we can create these languages. Because LLMs can optimize for things that humans can't, so it seems possible to design new languages to reduce bugs in ways that work for LLMs, but are less effective for people (due to syntax, ergonomics, verbosity, anything else).
This is crucially important. Why? Because 99% of all code written in the next two decades will be written by AI. And we will also produce 100x more code than has ever been written before (because the cost of doing it, has dropped essentially to zero). This means that, short of some revolutions in language technology, the number of bugs and vulnerabilities we can expect will also 100x.
That's why ideas like this are needed.
I believe in this too and am working on something also targeting LLMs specifically, and have been working on it since Mid to Late November last year. A business model will make such a language sustainable.
Say you have this new language, with only a tiny amount of examples of there. How do the SOTA labs train on you're language? With sufficient examples, it can generate code which gets compiled and then run and that gets fed into a feedback loop to improve upon, but how do you get there? How do you bootstrap that? Nevermind the dollar cost, how does it offer something above having an LLM generate code in python or JavaScript, then having it rewrite it in golang/rust/c++ as needed/possible for performance or whatever reason?
It sounds like your plan is for it to write fewer bugs in NewLang, but, well, that seems a bit hard to achieve in the abstract. From bugs I've fixed in generated code, early LLM, it was just bad code. Multiple variables for the same thing, especially. Recently they've gotten better at that, but it still happens.
For a concrete example, any app dealing with points in time. Which sometimes have a date attached but sometimes do not. And also, what are timezones. The complexity is there because it depends on what you're trying to do. An alarm clock is different than a calendar is different than a pomodoro timer. How are you going to reduce the bugged-ed-ness of that without making one of those use cases more complicated than need be, given access to various primitives.
Your hypothetical misses praxis: in my experience LLM can pick up any new syntax with ease. From a few examples, it can generate more. With a compiler (even partial on limited syntax), it can correct. It soon becomes fluent simply from the context of your codebase. You don't need to "train" an LLM to recognize language syntax. It's effortless for it to pick it up.
Or, maybe my lanng just had LLM-easy syntax - which would be good - but I think this is more just par for the course for LLMs, bud.
2 replies →
Ah, people are starting to see the light.
This is something that could be distilled from some industries like aviation, where specification of software (requirements, architecture documents, etc.) is even more important that the software itself.
The problem is that natural language is in itself ambiguous, and people don't really grasp the importance of clear specification (how many times I have repeated to put units and tolerances to any limits they specify by requirements).
Another problem is: natural language doesn't have "defaults": if you don't specify something, is open to interpretation. And people _will_ interpret something instead of saying "yep I don't know this".
You can use LLMs as specification compilers. They are quite good at finding ambiguities in specs and writing out lists of questions for the author to answer, or inferring sensible defaults in explicitly called out ways.
Time to bring out the flowcharts again!
I’ve been on a similar train of thought. Just last weekend I built a little experiment, using LLMs to highlight pseudocode syntax:
https://x.com/danielvaughn/status/2011280491287364067?s=46
This is the approach that Agint takes. We inference the structure of the code first top down as a graph, then add in types, then interpret the types as in out function signatures and then "inpaint" the functions for codegen.
I'm actually building this, will release it early next month. I've added a URL to watch to my profile (should be up later this week). It will be Open Source.
And so it comes full circle XD.
>>new ways to create specifications.
Thats again programming languages. Real issue with LLMs now is it doesn't matter if it can generate code quickly. Some one still has to read, verify and test it.
Perhaps we need a need a terse programming language. Which can be read quickly and verified. You could call that specification.
Yes, essentially a higher level programming language than what we currently have. A programming language that doesn't have strict syntax, and can be expressed with words or code. And like any other programming language, it includes specifications for the tests and expectations of the result.
The programming language can look more like code in parts where the specification needs to be very detailed. I think people can get intuition about where the LLM is unlikely to be successful. It can have low detail for boilerplate or code that is simple to describe.
You should be able to alter and recompile the specification, unlike the wandering prompt which makes changes faster than normal version control practices keep up with.
Perhaps there's a world where reading the specification rather than the compiled code is sufficient in order to keep cognitive load at reasonable levels.
At very least, you can read compiled code until you can establish your own validation set and create statistical expectations about your domain. Principally, these models will always be statistical in nature. So we probably need to start operating more inside that kind of framework if we really want to be professional about it.
We already have exceptionally high level languages, like Inform7 [0]. The concept doesn't work all that well. Terseness is a value. Its why we end up with so many symbol-heavy languages. Yes, there are tradeoffs, but that is the whole of computer science.
We didn't end up with Lean and Rust, for a lack of understanding in how to create strong specifications. Pascal-like languages fell out of favour, despite having higher readability.
[0] https://learnxinyminutes.com/inform7/
Simply put whatever you write should produce the same output regardless of how many times you execute it. The more verbose you make it, the more pointless it becomes.
More terse the better.
2 replies →
llm works great in closed loop so they can self correct but we don't have a reliable way to lint and test specs we need a new language for that
So in this case an LLM would just be a less-reliable compiler? What's the point? If you have to formally specify your program, we already have tools for that, no boiling-the-oceans required