Comment by ChicagoBoy11
11 hours ago
For anyone who liked this, I highly suggest you take a look at the CuriousMarc youtube channel, where he chronicles lots of efforts to preserve and understand several parts of the Apollo AGC, with a team of really technically competent and passionate collaborators.
One of the more interesting things they have been working on, is a potential re-interpretation of the infamous 1202 alarm. It is, as of current writing, popularly described as something related to nonsensical readings of a sensor which could (and were) safely ignored in the actual moon landing. However, if I remember correctly, some of their investigation revealed that actually there were many conditions which would cause that error to have been extremely critical and would've likely doomed the astronauts. It is super fascinating.
And that's why it's harder (or easier?) to make the same landing again -- we taking way less chances. Today we know of way more failure modes than back then.
They sent people up in a tin can with the bare minimum computational power to manage navigation and control sequencing. It was barely safer than taking a barrel over Niagara Falls. We do have much more capable and reliable technology.
Buzz Aldrin (?) was quoted as recalling holding a pencil inside the capsule as they were out in space and thinking "that wall isn't very thick or strong, I could probably jam a pencil through it pretty easily..."
Death being a layer of aluminum away changes your mind.
It's a miracle nobody died in flight during the program. Exploding oxygen tank, rockets shaking themselves to pieces during launch, getting hit by lightning on top of a flying skyscraper full of kerosene and liquid oxygen....
Gus Grissom, Ed White, and Roger Chaffee died on the Apollo program. I feel it's not polite to ignore that fact even if you add an 'in flight' qualifier.
Starting from the first test pilots, a lot of people died for us to get to the point to launch that flight. So while no one died on the flight, lots of people died just getting us there. If I recall, in The Right Stuff, it's mentioned that those early test pilots had something like a 25% mortality rate.
7 replies →
"popularly described" and how it's currently understood are two different things. Because it's hard to explain to lay people, it's popularly described in a number of simplified ways, but it's well understood.
Since we are on HN, I think it could be explained there (before it's all consumed by AI slop):
For complex reasons, available CPU time during landing was lower than expected (it was stolen by radar pointing peripheral). This caused regularly scheduled job to spawn before previous instance finished. As such, this caused two effects: job instances were preempted before finishing by new instances in the middle of the routine, and that pilling up of the old instances eventually exhausted resources and caused kernel to panic and reboot. Rebooting during landing sounds scary, but that actually was fine: such critical tasks were specifically designed to automatically restart from previously saved checkpoint data in the memory.
What was more dangerous, was the preempted tasks before restarts occured. First, it meant routine wasn't executing to the end, which in actual flight caused blanked displays (as updating the display was the last thing routine was doing). Any more CPU time stolen, and it could be interrupted even earlier, eg. before it sends the engine commands.
Another issue is that in case of fluctuating load, new instances could actually begin running to the end, and then previously preempted job instance could be resumed, potentially sending the stale data to the displays and engine.
And finally, while each job instance had it own core and VAC set properly managed by the kernel (think of it as modern kernel switching between task stacks), that particular routine wasn't designed to be reentrant. So it was using various global variables ("erasables") for its own purpose, that when interrupted in unluckly place might have caused very bad behavior.
How likely all of above is to occur, depends on the exact profile of fluctuating load caused by the confused radar peripheral. I guess that's why Mike Stewart is trying to replicate these issues with real CDU.
Related topic on CuriousMarc and co.’s AGC restoration: https://news.ycombinator.com/item?id=47641528