← Back to context

Comment by kick

6 years ago

This is the source to the original Bourne Shell, shipped in Research UNIX v7. You've probably used GNU Bash, which stands for "Bourne-Again SHell."

The Bourne sh is significant for a few reasons. Primarily, its GNU descendant is now installed on billions of devices.

Perhaps of more interest: Bourne's sh source code heavily abuses C macros to look and feel like ALGOL-68. This is made more significant because it came before C was standardized: it took real knowledge to abuse that much.

While this is C that compiled just fine for the day, and might compile mostly without errors for a compiler with a K&R compatibility mode, it's absolutely wild, and is written compensating for some of K&R's faults (see how it handles true v. false).

I recommend mac.h as a file of particular interest:

https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...

Bonus points to anyone who understands what these three lines are doing within it:

     #define LOBYTE 0377
     #define STRIP 0177
     #define QUOTE 0200

Also of interest:

This was specifically the reason that the International Obfuscated C Code Contest was created, started just minutes after seeing Bourne's sh for the first time (I'm sorry for formatting this as code block, but the formatting breaks otherwise):

     Q: How did the IOCCC get started?
     A: One day (23 March 1984 to be exact), back Larry 
    Bassel and I (Landon Curt Noll) were working for National 
    Semiconductor's Genix porting group, we were both in our 
    offices trying to fix some very broken code. Larry had been
    trying to fix a bug in the classic Bourne shell (C code 
    #defined to death to sort of look like Algol) and I had been
    working on the finger program from early BSD (a bug ridden 
    finger implementation to be sure). We happened to both 
    wander (at the same time) out to the hallway in Building 7C 
    to clear our heads.

     We began to compare notes: ''You won't believe the code
    I am trying to fix''. And: ''Well you cannot imagine the 
    brain damage level of the code I'm trying to fix''. As well 
    as: ''It more than bad code, the author really had to try to
    make it this bad!''.

    After a few minutes we wandered back into my office 
    where I posted a flame to net.lang.c inviting people to try 
    and out obfuscate the UN*X source code we had just been 
    working on.

> Bonus points to anyone who understands what these three lines are doing within it:

     #define LOBYTE 0377
     #define STRIP 0177
     #define QUOTE 0200

That's just 0xff, 0x7f and 0x80 in octal, and the high bit used to be a flag for all kinds of "magic" behaviour back when 7-bit ASCII was the norm...

Everyone talks about the macro abuse, nobody talks about the memory management abuse:

https://www.in-ulm.de/~mascheck/bourne/segv.html

> In comp.arch, 05/97, <5m2mu4$guf$1@murrow.corp.sgi.com>, John Mashey writes:

> For speed, Steve B had used a clever trick of using a memory arena without checking for the end, but placing it so that running off the end would cause a memory fault, which the shell then trapped, allocated more memory, then returned to the instruction that caused the trap and continued. The MC68000 (in order to go fast) had an exception model that broke this (among other things) and caused some grief to a whole generation of people porting UNIX to 68Ks in the early 1980s.

  • To be fair, that's not exactly a horrible perversion, relying on the existence of memory protection. Hell, that's precisely how stack grows dynamically on Windows: there is a guard page at the bottom, when it's touched, the kernel allocates it and marks the page below it as the new guard page. The downside is that stack allocations larger than 4 K have to manually probe memory to trigger this behaviour, that's what _stkchk from CRT does.

    Most of such low-level hacks are nowadays reserved exclusively for runtime implementations, probably for the better.

    • The early 68k machines had memory protection, they just couldn't resume the instruction that triggered it, it wasn't until the 68010 that this was fixed. You can also see evidence of this in early 68k C compilers, the function entry code would have a dummy instruction to probe the end of the area on the stack that was needed to hold any local variables.

Stephen Bourne's love for Algol 68 is also why the control flow in his shell makes use of backwards words such as 'fi' and 'esac' to end blocks: they originated in Algol 68, and Bourne loved them so much he put them in his shell.

  • And the only asymmetry there is do matching with done, as opposed to od, because od is the program octal dump.

You don't really have to use code blocks for that. Here's a copy that will be readable on mobile and preserves the formatting from the original (which the code block didn't):

Q: How did the IOCCC get started?

A: One day (23 March 1984 to be exact), back Larry Bassel and I (Landon Curt Noll) were working for National Semiconductor's Genix porting group, we were both in our offices trying to fix some very broken code. Larry had been trying to fix a bug in the classic Bourne shell (C code #defined to death to sort of look like Algol) and I had been working on the finger program from early BSD (a bug ridden finger implementation to be sure). We happened to both wander (at the same time) out to the hallway in Building 7C to clear our heads.

We began to compare notes: "You won't believe the code I am trying to fix". And: "Well you cannot imagine the brain damage level of the code I'm trying to fix". As well as: "It more than bad code, the author really had to try to make it this bad!".

After a few minutes we wandered back into my office where I posted a flame to net.lang.c inviting people to try and out obfuscate the UN*X source code we had just been working on.

From: https://www.ioccc.org/faq.html

  • I tried it without, initially.

    How'd you get the censored UNIX to work without italicizing half of the comment incorrectly? I tried escaping it in a few different ways, and no dice.

    • That must have been sheer luck! Or maybe because it was the last asterisk in the comment, and the ones before it were paired?

      I took the original and replaced the doubled single quotes with regular double quotes, and surrounded the italicized quotes with asterisks. That seemed to do the trick.

      Of course if the censored UNIX appeared earlier, or more than once, that would likely be a problem. Here are three of them in a row:

      UNX UNX UN*X

      Here's another idea. I suspect that the other Unicode asterisk-like characters might render without triggering italics. I'll use the asterisk operator (U+2217), heavy asterisk (U+2731), bold five spoked asterisk (U+1F7B1) in that order. Let's see how they look:

      UN∗X UNX UN🞱X

      OK, it looks like the heavy asterisk just disappears, bold five spoke is too big in Chrome on Windows and a box with an X in it in Chrome on Android, but the asterisk operator is a pretty good substitute for the regular one (albeit a bit small and light on desktop, a bit large on mobile). Now let's see if repeating it triggers any formatting:

      UN∗X UN∗X UN∗X

      I think we have a winner! Ah, the joys of HN comment formatting...

> Algol 68

By the way, it is an interesting (and simple) exercise to use macros to make C look almost like Oberon-07.