The Bourne Shell Source Code

6 years ago (tuhs.org)

This is the source to the original Bourne Shell, shipped in Research UNIX v7. You've probably used GNU Bash, which stands for "Bourne-Again SHell."

The Bourne sh is significant for a few reasons. Primarily, its GNU descendant is now installed on billions of devices.

Perhaps of more interest: Bourne's sh source code heavily abuses C macros to look and feel like ALGOL-68. This is made more significant because it came before C was standardized: it took real knowledge to abuse that much.

While this is C that compiled just fine for the day, and might compile mostly without errors for a compiler with a K&R compatibility mode, it's absolutely wild, and is written compensating for some of K&R's faults (see how it handles true v. false).

I recommend mac.h as a file of particular interest:

https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...

Bonus points to anyone who understands what these three lines are doing within it:

     #define LOBYTE 0377
     #define STRIP 0177
     #define QUOTE 0200

Also of interest:

This was specifically the reason that the International Obfuscated C Code Contest was created, started just minutes after seeing Bourne's sh for the first time (I'm sorry for formatting this as code block, but the formatting breaks otherwise):

     Q: How did the IOCCC get started?
     A: One day (23 March 1984 to be exact), back Larry 
    Bassel and I (Landon Curt Noll) were working for National 
    Semiconductor's Genix porting group, we were both in our 
    offices trying to fix some very broken code. Larry had been
    trying to fix a bug in the classic Bourne shell (C code 
    #defined to death to sort of look like Algol) and I had been
    working on the finger program from early BSD (a bug ridden 
    finger implementation to be sure). We happened to both 
    wander (at the same time) out to the hallway in Building 7C 
    to clear our heads.

     We began to compare notes: ''You won't believe the code
    I am trying to fix''. And: ''Well you cannot imagine the 
    brain damage level of the code I'm trying to fix''. As well 
    as: ''It more than bad code, the author really had to try to
    make it this bad!''.

    After a few minutes we wandered back into my office 
    where I posted a flame to net.lang.c inviting people to try 
    and out obfuscate the UN*X source code we had just been 
    working on.

  • > Bonus points to anyone who understands what these three lines are doing within it:

         #define LOBYTE 0377
         #define STRIP 0177
         #define QUOTE 0200
    

    That's just 0xff, 0x7f and 0x80 in octal, and the high bit used to be a flag for all kinds of "magic" behaviour back when 7-bit ASCII was the norm...

  • Everyone talks about the macro abuse, nobody talks about the memory management abuse:

    https://www.in-ulm.de/~mascheck/bourne/segv.html

    > In comp.arch, 05/97, <5m2mu4$guf$1@murrow.corp.sgi.com>, John Mashey writes:

    > For speed, Steve B had used a clever trick of using a memory arena without checking for the end, but placing it so that running off the end would cause a memory fault, which the shell then trapped, allocated more memory, then returned to the instruction that caused the trap and continued. The MC68000 (in order to go fast) had an exception model that broke this (among other things) and caused some grief to a whole generation of people porting UNIX to 68Ks in the early 1980s.

    • To be fair, that's not exactly a horrible perversion, relying on the existence of memory protection. Hell, that's precisely how stack grows dynamically on Windows: there is a guard page at the bottom, when it's touched, the kernel allocates it and marks the page below it as the new guard page. The downside is that stack allocations larger than 4 K have to manually probe memory to trigger this behaviour, that's what _stkchk from CRT does.

      Most of such low-level hacks are nowadays reserved exclusively for runtime implementations, probably for the better.

      2 replies →

  • Stephen Bourne's love for Algol 68 is also why the control flow in his shell makes use of backwards words such as 'fi' and 'esac' to end blocks: they originated in Algol 68, and Bourne loved them so much he put them in his shell.

    • And the only asymmetry there is do matching with done, as opposed to od, because od is the program octal dump.

  • You don't really have to use code blocks for that. Here's a copy that will be readable on mobile and preserves the formatting from the original (which the code block didn't):

    Q: How did the IOCCC get started?

    A: One day (23 March 1984 to be exact), back Larry Bassel and I (Landon Curt Noll) were working for National Semiconductor's Genix porting group, we were both in our offices trying to fix some very broken code. Larry had been trying to fix a bug in the classic Bourne shell (C code #defined to death to sort of look like Algol) and I had been working on the finger program from early BSD (a bug ridden finger implementation to be sure). We happened to both wander (at the same time) out to the hallway in Building 7C to clear our heads.

    We began to compare notes: "You won't believe the code I am trying to fix". And: "Well you cannot imagine the brain damage level of the code I'm trying to fix". As well as: "It more than bad code, the author really had to try to make it this bad!".

    After a few minutes we wandered back into my office where I posted a flame to net.lang.c inviting people to try and out obfuscate the UN*X source code we had just been working on.

    From: https://www.ioccc.org/faq.html

    • I tried it without, initially.

      How'd you get the censored UNIX to work without italicizing half of the comment incorrectly? I tried escaping it in a few different ways, and no dice.

      1 reply →

  • > Algol 68

    By the way, it is an interesting (and simple) exercise to use macros to make C look almost like Oberon-07.

Back when I was writing C compilers the Bourne Shell was my nemesis. The Bourne shell did exercise nearly every "feature" of the C language. Compiling and then running the shell was a great test case for an optimizing compiler and turned up many bugs. But, when the compiled code failed, winding back to the underlying C through the all of the macros and optimizations was exceptionally difficult. I still remember many a late night trying to figure out what happened. (Thanks Steve, for many fascinating hours of struggle.)

The whole program trusts definitions in mac.h [1] like:

    #define IF if(
    #define THEN ){
    #define ELSE } else {
    #define ELIF } else if (
    #define FI ;}

    #define BEGIN {
    #define END }
    #define SWITCH switch(
    #define IN ){
    #define ENDSW }
    #define FOR for(
    #define WHILE while(
    ...

Isn't it nowadays considered bad practice? After taking a glance at the code, I see there might be some advantages like not forgetting to add missing {}. Is there any other explanation on why they created a dialect on the top of C using the preprocessor?

[1] https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...

EDIT: fix English

  • Bourne liked ALGOL. A lot. So much that he was one of the few people who wrote their own ALGOL-68 compiler. Using the preprocessor to feel more at home is a pretty good idea in this case.

    This wasn't particularly popular to anyone who wasn't, well, Bourne, even at the time. I posted an example here: https://news.ycombinator.com/item?id=22199664

  • I’m under the impression that building up high level languages using macros was very common among assembly programmers, since C was new whoever wrote this may have come from assembly and taken the habit with them.

  • The author worked on ALGOL 68C and probably intended to reuse the more familiar syntax.

    Nowadays, I would indeed consider it a bad practice to use such macros. Especially if you intend to share the project with anyone else.

As someone who used the C preprocessor to generate CPP, Java, and C# from a common source in order to have a common library for native apps, I always appreciate a good bit of preprocessor abuse - it's one of the things that makes C so much fun!

  • My favorite idea of an extreme preprocessing is using C itself as the preprocessor language, with, optionally, the only syntactic sugar being provided by ASP-like brackets.

I'm another who recalls this mess being used (1990 or so) as the acid test for C compilers, source code analyzers and debuggers. I can still picture the look of pride on a certain salesman's face when he demoed a valgrind-like tool for us that didn't just crumble to pieces when asked to chew on this tangle.

Interestingly, https://news.ycombinator.com/item?id=22188704 , with similar techniques reinvented 4 decades later, was on Hacker News only yesterday.