Comment by Verdex
4 days ago
Thanks for giving details about your workflow. At least for me it helps a lot in these sorts of discussions.
Although, it is interesting to me that the original posting mentioned LLMs "one-shot"ing parsers and this description sounds like a much more in depth process.
"And there is no parser generator [...] that starts with examples [...]"
People. People can generate parsers by starting with examples. Which, again, is more in line with the original "one-shot parsers" comment.
If people are finding LLMs useful as part of a process for parser generation then I'm glad. (And I mean testing parsers is pretty painful to me so I'm interested in the test case generation). However I'm much more interested in the existence or non-existent of one-shot parser generation.
I recently did something similar, but different: gave Claude some code examples of a Rust-like language, it wrote a recursive descent parser for me. That was a one-shot, though it's a very simple language.
After more features were added, I decided I wanted BNF for it, so it went and wrote it all out correctly, after the fact, from the parser implementation.
Can you give more info?
How big of a number is "some"?
Also what kind of prompts were you feeding it? Did you describe it as Rust like? Anything else you feel is relevant.
[Is there a GitHub link? I'm more than happy to do the detective work.]
Like three or four. very simple language: main function whos value is the error code, functions of one argument returning one value, only ints, basic control flow and math.
I just opened the repo, here's the commit that did what I'm talking about: https://github.com/steveklabnik/rue/commit/5742e7921f241368e...
Well, the second part anyway, with the grammar. It writing the lexer starts as https://github.com/steveklabnik/rue/commit/a9bce389ea358365f..., it was basically this program.
If I wrote down the prompts, I'd share them, but I didn't.
Please ignore the large amount of llm bullshit in here, since it was private while I did this, I wasn't really worried about how annoying and slightly wrong the README etc was. HEAD is better in that regard.
1 reply →
I guess I don't really understand the goal of "one-shot" parser generation, since I can't even do that as a human using a parser generator! There's always an iterative process, as I find out how the language I wanted isn't quite the language I defined. Having somebody or something else write tests actually helps with that problem, as it'll exercise grammar cases outside my mental happy path.
The comment that started this whole thread off mentioned LLMs oneshot-ing parsers. I didn't think an LLM could one shot a parser and I am interested in parsers which is why I asked about more info.
It's not a goal of mine but because of interests in parsing I wanted to know if this was something that was happening or if it was hyperbole.
Well, I mean, it sort of did one-shot the parser in my case (with a few bugs, of course). It just didn't one-shot the parser I wanted, largely because my definition was unclear. It would be interesting to see how it did if I went to the trouble of giving it a truly rigorous prompt.