Comment by chubot
6 hours ago
A common problem I noticed is that if you took certain courses in computer science, you may have a pre-conceived notion of how to parse programming languages, and the shell language doesn't quite fit that model
I have seen this misconception many times
In Oils, we have some pretty minor elaborations of the standard model, and it makes things a lot easier
How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html
Everything I wrote there still holds, although that post could use some minor updates (and OSH is the most bash-compatible shell, and more POSIX-compatible than /bin/sh on Debian - e.g. https://pages.oils.pub/spec-compat/2025-11-02/renamed-tmp/sp... )
---
To summarize that, I'd say that doing as much work as possible in the lexer, with regular languages and "lexer modes", drastically reduces the complexity of writing a shell parser
And it's not just one parser -- shell actually has 5 to 15 different parsers, depending on how you count
I often show this file to make that point: https://oils.pub/release/0.37.0/pub/src-tree.wwz/_gen/_tmp/m...
(linked from https://oils.pub/release/0.37.0/quality.html)
Fine-grained heterogenous algebraic data types also help. Shells in C tend to use a homogeneous command* and word* kind of representation
https://oils.pub/release/0.37.0/pub/src-tree.wwz/frontend/sy... (~700 lines of type definitions)
No comments yet
Contribute on Hacker News ↗