← Back to context

Comment by swores

2 years ago

Great comment, thanks!

(I've sent a quick email suggesting it be added to https://news.ycombinator.com/highlights :)

If you're really into telephony history, the Internet Archive has "The History Of Engineering and Science in the Bell System" (3 volumes) online.

If you have to build reliable distributed systems, it's worth understanding how this was done in the electromechanical era of telephony, where the component reliability was much worse than the system reliability. "Number 5 Crossbar"[1] is worth reading, but hard to follow if you have no idea how telephone switching worked and are unfamiliar with the terminology.

Number 5 Crossbar, in current terms, was a collection of microservices. There was a big, dumb switch fabric, and "markers" which told it what to connect. Other microservices included trunks, originating registers (which listen to incoming dial digits), senders (which sent dial digits to the next switch), billing punches (which recorded toll call data for later billing), translators (which held routing tables), and trouble recorders (which logged errors.) Central offices had at least two of each resource, for redundancy. Resources were "seized" as needed from resource pools, with a hardware timeout and alarms to prevent resource lockup. If something went wrong in setting up a call, it was retried once, using different resources. If it failed on the second try, the caller got a fast busy and there was an alarm and a trouble recorder dropped a trouble card. Markers did not have persistent state. They started each call with a reset. So they could not get stuck in a bad state.

In the entire history of the Bell System, no electromechanical switching office was ever down for more than 30 minutes for any reason other than a natural disaster or a fire. It's worth understanding how they did that.

[1] https://telephoneworld.org/mdocs-posts/number-5-crossbar-sys...

  • Not truly related to the post content, but there is something about the way these old manuals are formatted/printed that immediately inspires confidence in the contents.

    Maybe because you know that someone spent a lot of time on it before it was published since no adjustments could be made after the fact.

  • > trouble recorders

    This feels like a term a sci-fi author would invent in an alternate history setting to replace "error log" and I find it very humorous.

    • No, just practical.

      The previous version was a panel of blinking lights called the "trouble indicator". When an alarm sounded, someone had to go to the panel and record by hand which lights were on. There were about 200 lights. So the trouble recorder, which recorded that info automatically, was added in larger central offices as an upgrade.[1]

      [1] https://hackaday.com/2022/12/02/stack-trace-from-the-1950s-p...