Comment by amelius

11 hours ago

It's a reminder of how archaic the systems we use are.

File descriptors are like handing pointers to the users of your software. At least allow us to use names instead of numbers.

And sh/bash's syntax is so weird because the programmer at the time thought it was convenient to do it like that. Nobody ever asked a user.

I've long wanted easy, trivial multichannel i/o with duplication

I want to be able to route x independent input and y independent output trivially from the terminal

Proper i/o routing

It shouldn't be hard, it shouldn't be unsolved, and it shouldn't be esoteric

At the time, the users were the programmers.

  • This is misleading because you use plural for both and I'm sure most of these UX missteps were _each_ made by a _single_ person, and there were >1 users even at the time.

  • arguably if you're using the CLI they still are

    • Yeah but now they're using npm to install a million packages to do things like tell if a number is greater than 10000. The chances of the programmer wanting to understand the underlying system they are using is essentially nil.

    • Yea, they are just much higher level programmers… most programmers don’t know the low level syscall apis.

    • nah, we have long had other disciplines using the CLI who do not write their own software, e.g. sysadmins

> At least allow us to use names instead of numbers.

You can for the destination. That's the whole reason you need the "&": to tell the shell the destination is not a named file (which itself could be a pipe or socket). And by default you don't need to specify the source fd at all. The intent is that stdout is piped along but stderr goes directly to your tty. That's one reason they are separate.

And for those saying "<" would have been better: that is used to read from the RHS and feed it as input to the LHS so it was taken.

It should be a lesson to learn on how simple, logical and reliable tools can last decades.

  • Bash syntax is anything but simple or logical. Just look at the insane if-statement syntax. Or how the choice of quotes fundamentally changes behavior. Argument parsing, looping, the list goes on.

    • if statements are pretty simple

      if $command; then <thing> else <thing> fi

      You may be complaining about the syntax for the test command specifically or bash’s [[ builtin

      Also the choice of quotes changing behavior is a thing in:

      1. JavaScript/typescript 2. Python 3. C/C++ 4. Rust

      In some cases it’s the same difference, eg: string interpolation in JavaScript with backticks

      2 replies →

  • It's more like how the need for backwards compatibility prevents bad interfaces from ever getting improved.

I quite like how archaic it is. I am turned off by a lot of modern stuff. My shell is nice and predictable. My scripts from 15 years ago still work just fine. No, I don't want it to get all fancy, thanks.

You can do:

   2>/dev/stdout

Which is about the same as `2>&1` but with a friendlier name for STDOUT. And this way `2> /dev/stdout`, with the space, also works, whereas `2> &1` doesn't which confuses many. But it's behavior isn't exactly the same and might not work in all situations.

And of course I wish you could use a friendlier name for STDERR instead of `2>`

  • > You can do: > > 2>/dev/stdout

    The situation where this is going to cause confusion is when you do this for multiple commands. It looks like they're all writing to a single file. Of course, that file is not an ordinary file - it's a device file. But even that isn't enough. You have to know that each command sees its own incarnation of /dev/stdout, which refers to its own fd1.

The conveniences also mean that there is more than ~one~ ~two~ several ways to do something.

Which means that reading someone else's shell script (or awk, or perl, or regex) is INCREDIBLY inconvenient.

  • Yes. There are many reasons why one shouldn't use sh/bash for scripting.

    But my main reason is that most scripts break when you call them with filenames that contain spaces. And they break spectacularly.

    • Counter reason in favor is that you can always count on it being there and working the same way. Perl is too out of fashion and python has too many versioning/library complexities.

      You have to write the crappy sh script once but then you get simple, easy usage every time. (If you're revising the script frequently enough that sh/bash are the bottleneck, then what you have is a dev project and not a script, use a programming language).

    • You're not wrong, but there's fairly easy ways to deal with filenames containing spaces - usually just enclosing any variable use within double quotes will be sufficient. It's tricker to deal with filenames that contain things such as line breaks as that usually involves using null terminated filenames (null being the only character that is not allowed in filenames). e.g find . -type f -print0

      3 replies →

They're more like capabilities or handles than pointers. There's a reason in Rust land many systems use handles (indices to a table of objects) in absence of pointer arithmetic.

In the C API of course there's symbolic names for these. STDIN_FILENO, STDOUT_FILENO, etc for the defaults and variables for the dynamically assigned ones.

  • What they point to are capabilities, but the integer handles that user space gets are annoyingly like pointers. In some respects, better, since we don’t do arithmetic on them, but in others, worse: they’re not randomized, and I’ve never come across a sanitizer (in the ASan sense) for them, so they’re vulnerable to worse race condition and use-after-free issues where data can be quietly sent to the entirely wrong place. Unlike raw pointers’ issues, this can’t even be solved at a language level. And maybe worst of all, there’s no bug locality: you can accidentally close the descriptor backing a `FILE*` just by passing the wrong small integer to `close` in an unrelated part of the program, and then it’ll get swapped out at the earliest opportunity.

    • BITD the one "fd sanitizer" I ever encountered was "try using the code on VxWorks" which at the time was "posix inspired" at best - fds actually were pointers, so effectively random and not small integers. It didn't catch enough things to be worth the trouble, but it did clean up some network code (ISTR I was working on SNTP and Kerberos v4 and Kerberized FTP when I ran into this...)

> At least allow us to use names instead of numbers.

You can use /dev/stdin, /dev/stdout, /dev/stderr in most cases, but it's not perfect.

  • > You can use /dev/stdin, /dev/stdout, /dev/stderr in most cases

    Never ever write code that assumes this. These dev shorthands are Linux specific, and you'll even need a certain minimum Linux version.

    I cringe at the amount of shell scripts that assume bash is the system interpreter, and not sh or ksh.

    Always assume sh, it's the most portable.

    Linux != Unix.

    • It's a waste of time unless you're specifically targeting and testing mac, all of the BSDs, various descendants of Solaris, and other flavors of Unix. I wrote enough "portable shell" to run into so many quirks and slight differences in flags, in how different tools handle e.g. SIGPIPE.

      Adding a new feature in a straightforward way often makes it work only on 4/7 of the operating systems you're trying to support. You then rewrite it in a slightly different way (because it's shell — there's always 50 ways to do the same thing). This gets you to 5/7 working systems, but breaks one that previously worked. You rewrite it yet another way, fixing the new breakage, but another one breaks. Repeat this over and over again, trying to find an implementation that works everywhere, or start adding workarounds for each system. Spend an hour on a feature that should have taken two minutes.

      If it's anything remotely complicated, and you need portability, then use perl/python/go.

    • Actually, while the Actual Nodes are a linux thing, bash itself implements (and documents) them directly (in redirections only), along with /dev/tcp and /dev/udp (you can show with strace that bash doesn't reference the filesystem for these, even if they're present.)

      So, you're not wrong, but...

    • lol truly informative and clearly something no one here knew. But your terminology is inaccurate. Please change it to GNU/Linux != Unix

Who do you imagine the users were back when it was being developed?

  • People who were not that one programmer?

    Even if you're a programmer, that doesn't mean you magically know what other programmers find easy or logical.

> bash's syntax is so weird

What should be the syntax according to contemporary IT people? JSON? YAML? Or just LLM prompt?

  • Nushell, Powershell, Python, Ruby, heck even Perl is better. Shell scripting is literally the worst language I've ever seen in common use. Any realistic alternative is going to be better.

  • Trying to be language agnostic: it should be as self-explanatory as possible. 2>&1 is all but.

    Why is there a 2 on the left, when the numbers are usually on the right. What's the relationship between 2 and 1? Is the 2 for std err? Is that `&` to mean "reference"? The fact you only grok it if you know POSIX sys calls means it's far from self explanatory. And given the proportion of people that know POSIX sys calls among those that use Bash, I think it's a bit of an elitist syntax.

    • POSIX has a manual for shell. You can read 99% of it without needing to know any syscalls. I'm not as familiar with it but Bash has an extensive manual as well, and I doubt syscall knowledge is particularly required there either.

      If your complaint is "I don't know what this syntax means without reading the manual" I'd like to point you to any contemporary language that has things like arrow functions, or operator overloading, or magic methods, or monkey patching.

  • Honestly, Python with the "sh" module is a lot more sane.

    • Is it more sane, or is it just what you are used to?

      Python doesn't really have much that makes it a sensible choice for scripting.

      Its got some basic data structures and a std-lib, but it comes at a non-trivial performance cost, a massive barrier to getting out of the single thread, and non-trivial overhead when managing downstream processes. It doesn't protect you from any runtime errors (no types, no compile checks). And I wouldn't call python in practice particularly portable...

      Laughably, NodeJS is genuinely a better choice - while you don't get multithreading easily, at least you aren't trivially blocked on IO. NodeJS also has pretty great compatibility for portability; and can be easily compiled/transformed to get your types and compile checks if you want. I'd still rather avoid managing downstream processes with it - but at least you know your JSON parsing and manipulation is trivial.

      Go is my goto when I'm reaching for more; but (ba)sh is king. You're scripting on the shell because you're mainly gluing other processes together, and this is what (ba)sh is designed to do. There is a learning curve, and there are footguns.