This is the source to the original Bourne Shell, shipped in Research UNIX v7. You've probably used GNU Bash, which stands for "Bourne-Again SHell."
The Bourne sh is significant for a few reasons. Primarily, its GNU descendant is now installed on billions of devices.
Perhaps of more interest: Bourne's sh source code heavily abuses C macros to look and feel like ALGOL-68. This is made more significant because it came before C was standardized: it took real knowledge to abuse that much.
While this is C that compiled just fine for the day, and might compile mostly without errors for a compiler with a K&R compatibility mode, it's absolutely wild, and is written compensating for some of K&R's faults (see how it handles true v. false).
I recommend mac.h as a file of particular interest:
This was specifically the reason that the International Obfuscated C Code Contest was created, started just minutes after seeing Bourne's sh for the first time (I'm sorry for formatting this as code block, but the formatting breaks otherwise):
Q: How did the IOCCC get started?
A: One day (23 March 1984 to be exact), back Larry
Bassel and I (Landon Curt Noll) were working for National
Semiconductor's Genix porting group, we were both in our
offices trying to fix some very broken code. Larry had been
trying to fix a bug in the classic Bourne shell (C code
#defined to death to sort of look like Algol) and I had been
working on the finger program from early BSD (a bug ridden
finger implementation to be sure). We happened to both
wander (at the same time) out to the hallway in Building 7C
to clear our heads.
We began to compare notes: ''You won't believe the code
I am trying to fix''. And: ''Well you cannot imagine the
brain damage level of the code I'm trying to fix''. As well
as: ''It more than bad code, the author really had to try to
make it this bad!''.
After a few minutes we wandered back into my office
where I posted a flame to net.lang.c inviting people to try
and out obfuscate the UN*X source code we had just been
working on.
> In comp.arch, 05/97, <5m2mu4$guf$1@murrow.corp.sgi.com>, John Mashey writes:
> For speed, Steve B had used a clever trick of using a memory arena without checking for the end, but placing it so that running off the end would cause a memory fault, which the shell then trapped, allocated more memory, then returned to the instruction that caused the trap and continued. The MC68000 (in order to go fast) had an exception model that broke this (among other things) and caused some grief to a whole generation of people porting UNIX to 68Ks in the early 1980s.
To be fair, that's not exactly a horrible perversion, relying on the existence of memory protection. Hell, that's precisely how stack grows dynamically on Windows: there is a guard page at the bottom, when it's touched, the kernel allocates it and marks the page below it as the new guard page. The downside is that stack allocations larger than 4 K have to manually probe memory to trigger this behaviour, that's what _stkchk from CRT does.
Most of such low-level hacks are nowadays reserved exclusively for runtime implementations, probably for the better.
Stephen Bourne's love for Algol 68 is also why the control flow in his shell makes use of backwards words such as 'fi' and 'esac' to end blocks: they originated in Algol 68, and Bourne loved them so much he put them in his shell.
You don't really have to use code blocks for that. Here's a copy that will be readable on mobile and preserves the formatting from the original (which the code block didn't):
Q: How did the IOCCC get started?
A: One day (23 March 1984 to be exact), back Larry Bassel and I (Landon Curt Noll) were working for National Semiconductor's Genix porting group, we were both in our offices trying to fix some very broken code. Larry had been trying to fix a bug in the classic Bourne shell (C code #defined to death to sort of look like Algol) and I had been working on the finger program from early BSD (a bug ridden finger implementation to be sure). We happened to both wander (at the same time) out to the hallway in Building 7C to clear our heads.
We began to compare notes: "You won't believe the code I am trying to fix". And: "Well you cannot imagine the brain damage level of the code I'm trying to fix". As well as: "It more than bad code, the author really had to try to make it this bad!".
After a few minutes we wandered back into my office where I posted a flame to net.lang.c inviting people to try and out obfuscate the UN*X source code we had just been working on.
Back when I was writing C compilers the Bourne Shell was my nemesis. The Bourne shell did exercise nearly every "feature" of the C language. Compiling and then running the shell was a great test case for an optimizing compiler and turned up many bugs. But, when the compiled code failed, winding back to the underlying C through the all of the macros and optimizations was exceptionally difficult. I still remember many a late night trying to figure out what happened. (Thanks Steve, for many fascinating hours of struggle.)
The whole program trusts definitions in mac.h [1] like:
#define IF if(
#define THEN ){
#define ELSE } else {
#define ELIF } else if (
#define FI ;}
#define BEGIN {
#define END }
#define SWITCH switch(
#define IN ){
#define ENDSW }
#define FOR for(
#define WHILE while(
...
Isn't it nowadays considered bad practice? After taking a glance at the code, I see there might be some advantages like not forgetting to add missing {}. Is there any other explanation on why they created a dialect on the top of C using the preprocessor?
Bourne liked ALGOL. A lot. So much that he was one of the few people who wrote their own ALGOL-68 compiler. Using the preprocessor to feel more at home is a pretty good idea in this case.
I’m under the impression that building up high level languages using macros was very common among assembly programmers, since C was new whoever wrote this may have come from assembly and taken the habit with them.
As someone who used the C preprocessor to generate CPP, Java, and C# from a common source in order to have a common library for native apps, I always appreciate a good bit of preprocessor abuse - it's one of the things that makes C so much fun!
My favorite idea of an extreme preprocessing is using C itself as the preprocessor language, with, optionally, the only syntactic sugar being provided by ASP-like brackets.
I'm another who recalls this mess being used (1990 or so) as the acid test for C compilers, source code analyzers and debuggers. I can still picture the look of pride on a certain salesman's face when he demoed a valgrind-like tool for us that didn't just crumble to pieces when asked to chew on this tangle.
The technique of macroing C to death to look like another language never died! It's still a very popular thing to do, but it's gotten a bit less wild now that C has been standardized to death.
This is the source to the original Bourne Shell, shipped in Research UNIX v7. You've probably used GNU Bash, which stands for "Bourne-Again SHell."
The Bourne sh is significant for a few reasons. Primarily, its GNU descendant is now installed on billions of devices.
Perhaps of more interest: Bourne's sh source code heavily abuses C macros to look and feel like ALGOL-68. This is made more significant because it came before C was standardized: it took real knowledge to abuse that much.
While this is C that compiled just fine for the day, and might compile mostly without errors for a compiler with a K&R compatibility mode, it's absolutely wild, and is written compensating for some of K&R's faults (see how it handles true v. false).
I recommend mac.h as a file of particular interest:
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...
Bonus points to anyone who understands what these three lines are doing within it:
Also of interest:
This was specifically the reason that the International Obfuscated C Code Contest was created, started just minutes after seeing Bourne's sh for the first time (I'm sorry for formatting this as code block, but the formatting breaks otherwise):
> Bonus points to anyone who understands what these three lines are doing within it:
That's just 0xff, 0x7f and 0x80 in octal, and the high bit used to be a flag for all kinds of "magic" behaviour back when 7-bit ASCII was the norm...
Yup, the dash shell (from Debian) still uses this, so it won't support unicode any time soon.
I find myself validating a lot of input to be ascii still. I think its time to write a lib to make use of all those wasted bits.
11 replies →
Bingo!
Everyone talks about the macro abuse, nobody talks about the memory management abuse:
https://www.in-ulm.de/~mascheck/bourne/segv.html
> In comp.arch, 05/97, <5m2mu4$guf$1@murrow.corp.sgi.com>, John Mashey writes:
> For speed, Steve B had used a clever trick of using a memory arena without checking for the end, but placing it so that running off the end would cause a memory fault, which the shell then trapped, allocated more memory, then returned to the instruction that caused the trap and continued. The MC68000 (in order to go fast) had an exception model that broke this (among other things) and caused some grief to a whole generation of people porting UNIX to 68Ks in the early 1980s.
To be fair, that's not exactly a horrible perversion, relying on the existence of memory protection. Hell, that's precisely how stack grows dynamically on Windows: there is a guard page at the bottom, when it's touched, the kernel allocates it and marks the page below it as the new guard page. The downside is that stack allocations larger than 4 K have to manually probe memory to trigger this behaviour, that's what _stkchk from CRT does.
Most of such low-level hacks are nowadays reserved exclusively for runtime implementations, probably for the better.
2 replies →
> memory fault
That reminds me of how virtual memory works.
Stephen Bourne's love for Algol 68 is also why the control flow in his shell makes use of backwards words such as 'fi' and 'esac' to end blocks: they originated in Algol 68, and Bourne loved them so much he put them in his shell.
And the only asymmetry there is do matching with done, as opposed to od, because od is the program octal dump.
You don't really have to use code blocks for that. Here's a copy that will be readable on mobile and preserves the formatting from the original (which the code block didn't):
Q: How did the IOCCC get started?
A: One day (23 March 1984 to be exact), back Larry Bassel and I (Landon Curt Noll) were working for National Semiconductor's Genix porting group, we were both in our offices trying to fix some very broken code. Larry had been trying to fix a bug in the classic Bourne shell (C code #defined to death to sort of look like Algol) and I had been working on the finger program from early BSD (a bug ridden finger implementation to be sure). We happened to both wander (at the same time) out to the hallway in Building 7C to clear our heads.
We began to compare notes: "You won't believe the code I am trying to fix". And: "Well you cannot imagine the brain damage level of the code I'm trying to fix". As well as: "It more than bad code, the author really had to try to make it this bad!".
After a few minutes we wandered back into my office where I posted a flame to net.lang.c inviting people to try and out obfuscate the UN*X source code we had just been working on.
From: https://www.ioccc.org/faq.html
I tried it without, initially.
How'd you get the censored UNIX to work without italicizing half of the comment incorrectly? I tried escaping it in a few different ways, and no dice.
1 reply →
> Algol 68
By the way, it is an interesting (and simple) exercise to use macros to make C look almost like Oberon-07.
Back when I was writing C compilers the Bourne Shell was my nemesis. The Bourne shell did exercise nearly every "feature" of the C language. Compiling and then running the shell was a great test case for an optimizing compiler and turned up many bugs. But, when the compiled code failed, winding back to the underlying C through the all of the macros and optimizations was exceptionally difficult. I still remember many a late night trying to figure out what happened. (Thanks Steve, for many fascinating hours of struggle.)
The whole program trusts definitions in mac.h [1] like:
Isn't it nowadays considered bad practice? After taking a glance at the code, I see there might be some advantages like not forgetting to add missing {}. Is there any other explanation on why they created a dialect on the top of C using the preprocessor?
[1] https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...
EDIT: fix English
Bourne liked ALGOL. A lot. So much that he was one of the few people who wrote their own ALGOL-68 compiler. Using the preprocessor to feel more at home is a pretty good idea in this case.
This wasn't particularly popular to anyone who wasn't, well, Bourne, even at the time. I posted an example here: https://news.ycombinator.com/item?id=22199664
I’m under the impression that building up high level languages using macros was very common among assembly programmers, since C was new whoever wrote this may have come from assembly and taken the habit with them.
The author worked on ALGOL 68C and probably intended to reuse the more familiar syntax.
Nowadays, I would indeed consider it a bad practice to use such macros. Especially if you intend to share the project with anyone else.
As someone who used the C preprocessor to generate CPP, Java, and C# from a common source in order to have a common library for native apps, I always appreciate a good bit of preprocessor abuse - it's one of the things that makes C so much fun!
My favorite idea of an extreme preprocessing is using C itself as the preprocessor language, with, optionally, the only syntactic sugar being provided by ASP-like brackets.
I'm another who recalls this mess being used (1990 or so) as the acid test for C compilers, source code analyzers and debuggers. I can still picture the look of pride on a certain salesman's face when he demoed a valgrind-like tool for us that didn't just crumble to pieces when asked to chew on this tangle.
Interestingly, https://news.ycombinator.com/item?id=22188704 , with similar techniques reinvented 4 decades later, was on Hacker News only yesterday.
Yes, I commented there about the Bourne Shell and the IOCCC: https://news.ycombinator.com/item?id=22192910
The technique of macroing C to death to look like another language never died! It's still a very popular thing to do, but it's gotten a bit less wild now that C has been standardized to death.
I remember in college some friends of mine decided to macro C into really bad fake German. Think 'inten mainen" and 'printenoutenf'.