Four Column ASCII

9 years ago (garbagecollected.org)

One bit of trivia I really love is why "DEL" is at 127 -- weirdly way away from all the control codes.

It's because 0x7F is all ASCII bits set to "1". Back in the early punch card (and telegraph?) days, if there was a typo, you couldn't "unpunch" a hole to make it a 0 again, but you could punch out all the rest of them, indicating "ignore this char, I've deleted it" -- 0b1111111.

source: http://www.trafficways.org/ascii/ascii.pdf which is a really neat read if you like that sort of thing :)

ASCII was very carefully designed.

One feature I like is that you can neatly cut out a 64-character 6-bit all-uppercase encoding (like DEC's) if you don't need lowercase.

Wikipedia covers some of the design choices here: https://en.wikipedia.org/wiki/ASCII#Internal_organization

For sentimental reasons I still carry around my Teletype ASCII reference card which is laid out this way. For extra retro goodness it shows the bits as holes on a paper tape with the feed holes and everything.

When you have to design a device that does ASCII entirely mechanically then this is a efficient way to structure it. I would not be surprised to hear that mechanical considerations had a influence on the question of what bits went where.

Added: OK, I have managed to not entirely surprise myself:

* https://en.wikipedia.org/wiki/ASCII#Internal_organization

This seems to say that the influence was partially mechanical typewriters but the Teletype Model 33 entirely followed ASCII.

There is more to it.

For example, note that the numerals map to their direct binary notation plus a 011 in front. 0 => ...0000, 1 => ...0001, 2 => ...0010, etc.

Now I wonder, why don't they start in the zero row? In other words, why is 0 = 0110000, instead of 0100000?

Why are the parens not in the same row as braces and brackets?

Why isn't "&" (ligature of "et") not in the same row as "e"? "$" ("dollar") is in the same row as "d".

Oh, that's why ^J is a literal newline in Emacs. How did I miss that for 25 years?

(BTW, for anyone curious, you match a newline in a regexp in emacs by using ^Q ^J, the first is the quote operator and the second is the character you want, ^J, or newline)

  • If you're wondering "why not ^Q then Enter?", that gets you CR or ^M, but in UNIX, newline or ^J is what gets you to the next line.

    So, yes, you normally push CR to enter the LF character.

    The confusions between CR and LF run deep and wide, even in the UNIX world.

    • You will also see ^M on the end of each line in vim, if you open a CRLF line terminated file in it, but it thinks it's a LF line terminated file.

  • > that's why ^J is a literal newline

    So I don't quite understand how that table explains it. I mean aside from the fact J character code being the LF code with first two bits zeroed.

    • It is explained under the chart for [ and Esc. CTRL zeroes out the first two bits of a character. So that is the explanation.

Very interesting, thanks. I never knew that key combinations on the terminal were actually just shortcuts to send specific control characters.

It's a bit confusing to me that the column header bits are added to the LEFT of the row identifiers. Might be helpful to report the row ids as "__00000" or similar.

This is very neat. I've done lots of work with binary text for hardware, but I would've never noticed this on my own. However now that it's written out, it seems very straight forward.

   man ascii

on most Linux systems shows a similar layout.

I wonder if 00-1F could be added to the summary, using the Unicode Control Pictures range for added irony.

https://en.wikipedia.org/wiki/Control_Pictures (␀ ␁ ␂ ␃ ␄ ␅ ␆ ␇ ␈ ␉ ␊ ␋ ␌ ␍ ␎ ␏ ␐ ␑ ␒ ␓ ␔ ␕ ␖ ␗ ␘ ␙ ␚ ␛ ␜ ␝ ␞ ␟)

If you want to geek-out further on ASCII have a look at the late Bob Bemer's site: https://www.bobbemer.com/ Bob is colloquially known as the "father of ASCII" (among other things) and his writing is fun to read and interesting.

Does this mean that ^; and ^{ are also equivalent to ESC?

  • ^{ is, but on my terminal ^; just prints out a ;

    iTerm2, in vim 1.8

    • ^[ and ^{ do the same thing in Terminal.app (macOS Sierra) and vim 8.0. However, I get the bell sound (which generally denotes invalid input in macOS) for ^; and it prints nothing.

  • You have control characters for the characters from 64 to 95.

        Control-@ is 0.  
        Control-A is 1, through control-Z is 26.
        Control-[ is 27, escape.
        Control-\ is 28.
        Control-] is 29.
        Control-^ is 30.
        Control-_ is 31.

  • I guess it could in theory, on my keyboard, however, CTRL seems to override other control chars, so typing ^{ doesn't seem possible at all, at least without any hacks. (I don't have a US layout keyboard.)

  • According to the bit-wise AND at the end of the article it should, but it seems like there's more to it than just zeroing out the 'column' bits.

    • Not sure if this is specifically related, but-

      CTRL does not actually modify the character code sent from the keyboard. For letters, the same keycode (which maps to ASCII with a constant addition of 0x3D) is always sent. Another byte in the HID report contains bit flags for modifier keys (L/R CTRL, SHIFT, etc); the OS decides what happens after that.

      Note that this holds under USB HID.

      1 reply →

The way this is layed out in a table and suddenly properties line up in rows and columns reminds me of the Periodic Table in Chemistry.

Obviously it shows how the ASCII committee used the first two bits as control bits and the remaining bits as a mixture of control and data bits, but I’d never seen it displayed this way.

Really neat.

  • Funny you say that. Here's what I created for my personal use:

    http://files.carussell.fastmail.fm/public/ascii-print-screen...

    I think the horizontal layout is a lot more readable, especially with the ability to read the bitstring more-or-less left-to-right. Only thing is that it's pretty wide—maybe too wide. The document that screenshot is from is meant to take up an A3 sheet split in half lengthwise. Someone willing to spend more time on it than I was would probably be able come up with helpful notes for the bottom half, or shift things around so elements corresponding to the low bit in the upper nibble are nestled below the items where that bit is off.

And then there's IBM's EBCDIC, of similar vintage to ASCII, but of markedly dissimilar utility.

http://www.ibm.com/support/knowledgecenter/en/SSGH4D_14.1.0/...

  • EBCDIC seems elegant in its own way: http://www.quadibloc.com/comp/cardint.htm (scroll down) -- apparently it's descended from IBM punch card formats. The discontinuity in the alphabet seems inconvenient for sorting, but it looks like it shares some properties (like bit-flip to make lower case) with ASCII.

    • The digits and letters actually map quite nicely to punch cards. You can see how punches 0 - 9 map exaclty to EBCDIC F0-F9. And if you check how the letters are coded on punch cards (1 - 9 plus one punch "above" in the zone and 0 rows) you can see how it maps exactly to EBCDIC C1-C9, D1-D9, E2-E9. Most other characters aren't coded quite as neatly, I don't know if there is a system to them.

I never did the column layout, but I knew about the bit flips for control and shift from both Tom Scott's video on reading ASCII, and from reading about the Meta key.

8-columns, however, makes it clear why the ASR-33 had "!" over "1".

Which makes it clear why the Apple I, II and II+ did the same.

  • The ASR-33 even had [ \ ] ^ _ available only via shift + K L M N O. This is called a bit-paired keyboard.

This also shows why on IRC, [\]^ are considered to be case-shifted versions of {|}~.

This is new? Us old folks who remember mechanical TTY machines like ASR-33 know this very well.

Of course, kids today don't even have ESC keys because Tim Cook deemed them unnecessary. Nobody uses Vim or Emacs on a Mac! You just use them to browse the web and blog!