The Bourne Shell Source Code

6 years ago (tuhs.org)

40 comments

kick

This is the source to the original Bourne Shell, shipped in Research UNIX v7. You've probably used GNU Bash, which stands for "Bourne-Again SHell."

The Bourne sh is significant for a few reasons. Primarily, its GNU descendant is now installed on billions of devices.

Perhaps of more interest: Bourne's sh source code heavily abuses C macros to look and feel like ALGOL-68. This is made more significant because it came before C was standardized: it took real knowledge to abuse that much.

While this is C that compiled just fine for the day, and might compile mostly without errors for a compiler with a K&R compatibility mode, it's absolutely wild, and is written compensating for some of K&R's faults (see how it handles true v. false).

I recommend mac.h as a file of particular interest:

https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...

Bonus points to anyone who understands what these three lines are doing within it:

     #define LOBYTE 0377
     #define STRIP 0177
     #define QUOTE 0200

Also of interest:

This was specifically the reason that the International Obfuscated C Code Contest was created, started just minutes after seeing Bourne's sh for the first time (I'm sorry for formatting this as code block, but the formatting breaks otherwise):

     Q: How did the IOCCC get started?
     A: One day (23 March 1984 to be exact), back Larry 
    Bassel and I (Landon Curt Noll) were working for National 
    Semiconductor's Genix porting group, we were both in our 
    offices trying to fix some very broken code. Larry had been
    trying to fix a bug in the classic Bourne shell (C code 
    #defined to death to sort of look like Algol) and I had been
    working on the finger program from early BSD (a bug ridden 
    finger implementation to be sure). We happened to both 
    wander (at the same time) out to the hallway in Building 7C 
    to clear our heads.

     We began to compare notes: ''You won't believe the code
    I am trying to fix''. And: ''Well you cannot imagine the 
    brain damage level of the code I'm trying to fix''. As well 
    as: ''It more than bad code, the author really had to try to
    make it this bad!''.

    After a few minutes we wandered back into my office 
    where I posted a flame to net.lang.c inviting people to try 
    and out obfuscate the UN*X source code we had just been 
    working on.

eqvinox 6 years ago
> Bonus points to anyone who understands what these three lines are doing within it:
#define LOBYTE 0377 #define STRIP 0177 #define QUOTE 0200
That's just 0xff, 0x7f and 0x80 in octal, and the high bit used to be a flag for all kinds of "magic" behaviour back when 7-bit ASCII was the norm...
- chubot 6 years ago
  
  Yup, the dash shell (from Debian) still uses this, so it won't support unicode any time soon.
- teknopaul 6 years ago
  
  I find myself validating a lot of input to be ascii still. I think its time to write a lib to make use of all those wasted bits.
  
  11 replies →
- kick 6 years ago
  
  Bingo!
msla 6 years ago
Everyone talks about the macro abuse, nobody talks about the memory management abuse:
https://www.in-ulm.de/~mascheck/bourne/segv.html
> In comp.arch, 05/97, <5m2mu4$guf$1@murrow.corp.sgi.com>, John Mashey writes:
> For speed, Steve B had used a clever trick of using a memory arena without checking for the end, but placing it so that running off the end would cause a memory fault, which the shell then trapped, allocated more memory, then returned to the instruction that caused the trap and continued. The MC68000 (in order to go fast) had an exception model that broke this (among other things) and caused some grief to a whole generation of people porting UNIX to 68Ks in the early 1980s.
- Joker_vD 6 years ago
  
  To be fair, that's not exactly a horrible perversion, relying on the existence of memory protection. Hell, that's precisely how stack grows dynamically on Windows: there is a guard page at the bottom, when it's touched, the kernel allocates it and marks the page below it as the new guard page. The downside is that stack allocations larger than 4 K have to manually probe memory to trigger this behaviour, that's what _stkchk from CRT does.
  Most of such low-level hacks are nowadays reserved exclusively for runtime implementations, probably for the better.
  
  2 replies →
- Koshkin 6 years ago
  
  > memory fault
  That reminds me of how virtual memory works.
amyjess 6 years ago
Stephen Bourne's love for Algol 68 is also why the control flow in his shell makes use of backwards words such as 'fi' and 'esac' to end blocks: they originated in Algol 68, and Bourne loved them so much he put them in his shell.
- msla 6 years ago
  
  And the only asymmetry there is do matching with done, as opposed to od, because od is the program octal dump.
Stratoscope 6 years ago
You don't really have to use code blocks for that. Here's a copy that will be readable on mobile and preserves the formatting from the original (which the code block didn't):
Q: How did the IOCCC get started?
A: One day (23 March 1984 to be exact), back Larry Bassel and I (Landon Curt Noll) were working for National Semiconductor's Genix porting group, we were both in our offices trying to fix some very broken code. Larry had been trying to fix a bug in the classic Bourne shell (C code #defined to death to sort of look like Algol) and I had been working on the finger program from early BSD (a bug ridden finger implementation to be sure). We happened to both wander (at the same time) out to the hallway in Building 7C to clear our heads.
We began to compare notes: "You won't believe the code I am trying to fix". And: "Well you cannot imagine the brain damage level of the code I'm trying to fix". As well as: "It more than bad code, the author really had to try to make it this bad!".
After a few minutes we wandered back into my office where I posted a flame to net.lang.c inviting people to try and out obfuscate the UN*X source code we had just been working on.
From: https://www.ioccc.org/faq.html
- kick 6 years ago
  
  I tried it without, initially.
  How'd you get the censored UNIX to work without italicizing half of the comment incorrectly? I tried escaping it in a few different ways, and no dice.
  
  1 reply →
Koshkin 6 years ago

> Algol 68
By the way, it is an interesting (and simple) exercise to use macros to make C look almost like Oberon-07.

drallison 6 years ago

Back when I was writing C compilers the Bourne Shell was my nemesis. The Bourne shell did exercise nearly every "feature" of the C language. Compiling and then running the shell was a great test case for an optimizing compiler and turned up many bugs. But, when the compiled code failed, winding back to the underlying C through the all of the macros and optimizations was exceptionally difficult. I still remember many a late night trying to figure out what happened. (Thanks Steve, for many fascinating hours of struggle.)

silasdb 6 years ago

The whole program trusts definitions in mac.h [1] like:

    #define IF if(
    #define THEN ){
    #define ELSE } else {
    #define ELIF } else if (
    #define FI ;}

    #define BEGIN {
    #define END }
    #define SWITCH switch(
    #define IN ){
    #define ENDSW }
    #define FOR for(
    #define WHILE while(
    ...

Isn't it nowadays considered bad practice? After taking a glance at the code, I see there might be some advantages like not forgetting to add missing {}. Is there any other explanation on why they created a dialect on the top of C using the preprocessor?

[1] https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...

EDIT: fix English

kick 6 years ago

Bourne liked ALGOL. A lot. So much that he was one of the few people who wrote their own ALGOL-68 compiler. Using the preprocessor to feel more at home is a pretty good idea in this case.
This wasn't particularly popular to anyone who wasn't, well, Bourne, even at the time. I posted an example here: https://news.ycombinator.com/item?id=22199664
swiley 6 years ago

I’m under the impression that building up high level languages using macros was very common among assembly programmers, since C was new whoever wrote this may have come from assembly and taken the habit with them.
wgml 6 years ago

The author worked on ALGOL 68C and probably intended to reuse the more familiar syntax.
Nowadays, I would indeed consider it a bad practice to use such macros. Especially if you intend to share the project with anyone else.

gyrator 6 years ago

As someone who used the C preprocessor to generate CPP, Java, and C# from a common source in order to have a common library for native apps, I always appreciate a good bit of preprocessor abuse - it's one of the things that makes C so much fun!

Koshkin 6 years ago

My favorite idea of an extreme preprocessing is using C itself as the preprocessor language, with, optionally, the only syntactic sugar being provided by ASP-like brackets.

m0d0nne11 6 years ago

I'm another who recalls this mess being used (1990 or so) as the acid test for C compilers, source code analyzers and debuggers. I can still picture the look of pride on a certain salesman's face when he demoed a valgrind-like tool for us that didn't just crumble to pieces when asked to chew on this tangle.

JdeBP 6 years ago

Interestingly, https://news.ycombinator.com/item?id=22188704 , with similar techniques reinvented 4 decades later, was on Hacker News only yesterday.

kragen 6 years ago

Yes, I commented there about the Bourne Shell and the IOCCC: https://news.ycombinator.com/item?id=22192910
kick 6 years ago
The technique of macroing C to death to look like another language never died! It's still a very popular thing to do, but it's gotten a bit less wild now that C has been standardized to death.
- amyjess 6 years ago
  
  I remember in college some friends of mine decided to macro C into really bad fake German. Think 'inten mainen" and 'printenoutenf'.