Comment by derefr

5 hours ago

> I still struggle to comprehend, even in the slightest, how programmers back then did what they did - and the worlds they created with the limitations they had to work with.

Highly related: two videos covering exactly how they fit...

- Super Mario Bros 1 into 40KiB (https://www.youtube.com/watch?v=1ysdUajrhL8)

- and Super Mario Bros 2 into 256KiB (https://www.youtube.com/watch?v=UdD26eFVzHQ)

I highly advise watching the actual videos to best understand, since all the techniques used were very likely devised from a game-dev perspective, rather than by invoking any abstract CS textbook learning.

But if I did want to summarize the main "tricks" used, in terms of such abstract CS concepts:

1. These old games can be understood as essentially having much of their data (level data, music data, etc) "compressed" using various highly-domain-specific streaming compressors. (I say "understood as" because, while the decompression logic literally exists in the game, there was likely no separate "compression" logic; rather, the data "file formats" were likely just designed to represent everything in this highly-space-efficient encoding. There were no "source files" using a more raw representation; both tooling and hand-edits were likely operating directly against data stored in this encoding.)

2. These streaming compressors act similar to modern multimedia codecs, in the sense that they don't compress sequences-of-structures (which would give low sequence correlation), but instead first decompose the data into distinct, de-correlated sub-streams / channels / planes (i.e. structures-of-sequences), which then "compress" much better.

3. Rather than attempting to decompose a single lossless description of the data into several sub-streams that are themselves lossless descriptions of some hyperplane through the data, a different approach is used: that of each sub-channel storing an imperative "painting" logic against a conceptual mutable canvas or buffer shared with other sub-channels. The data stream for any given sub-channel may actually be lossy (i.e. might "paint" something into the buffer that shouldn't appear in the final output), but such "slop"/"bleed" gets overwritten — either by another sub-channel's output, or by something the same sub-channel emits later on in the same "pass". The decompressor essentially "paints over" any mistakes it makes, to arrive at a final flattened canvas state that is a lossless reproduction of the intended state.

4. Decompression isn't something done in its entirety into a big in-memory buffer on asset load. (There isn't the RAM to do that!) But nor is decompression a pure streaming operation, cleanly producing sequential outputs. Instead, decompression is incremental: it operates on / writes to one narrow + moving slice of an in-memory data "window buffer" at a time. Which can somewhat be thought of as a ring buffer, because the decompressor coroutine owns whichever slice it's writing to, which is expected to not be read from while it owns it, so it can freely give that slice to its sub-channel "painters" to fill up. (Note that this is a distinct concept from how any long, larger-than-memory data [tilemaps, music] will get spooled out into VRAM/ARAM as it's being scrolled/played. That process is actually done just using boring old blits; but it consumes the same ring-buffer slices the decompressor is producing.)

5. Different sub-channels may be driven at different granularities and feed into more or fewer windowing/buffering pipeline stages before landing as active state. For example, tilemap data is decompressed into its "window buffer" one page at a time, each time the scroll position crosses a page boundary; but object data is decompressed / "scheduled" into Object Attribute Memory one column at a time (or row at a time, in SMB2, sometimes) every time the scroll position advances by a (meta)tile width.

6. Speaking of metatiles — sub-channels, rather than allowing full flexibility of "write primitive T to offset Y in the buffer", may instead only permit encodings of references to static data tables of design-time pre-composed patterns of primitives. For tilemaps, these patterns are often called "meta-tiles" or "macro-blocks". (This is one reason sub-channels are "lossy" reconstructors: if you can only encode macro-blocks, then you'll often find yourself wanting only some part of a macro-block — which means drawing it and then overdrawing the non-desired parts of it.)

7. Sub-channels may also operate as fixed-function retained-mode procedural synthesis engines, where rather than specifying specific data to write, you only specify for each timestep how the synthesis parameters should change. This is essentially how modular audio synthesis encoding works; but more interestingly, it's also true of the level data "base terrain" sub-channel, which essentially takes "ceiling" and "ground" brush parameters, and paints these in per column according to some pattern-ID parameter referencing a table of [ceiling width][floor height] combinations. (And the retained-mode part means that for as long as everything stays the same, this sub-channel compresses to nothing!)

8. Sub-channels may also contain certain encoded values that branch off into their own special logic, essentially triggering the use of paint-program-like "brushes" to paint arbitrarily within the "canvas." For example, in SMB1, a "pipe tile" is really a pipe brush invocation, that paints a pipe into the window, starting from the tile's encoded position as its top-left corner, painting right two meta-tiles, and downward however-many meta-tiles are required to extend the pipe to the current "base terrain" floor height.

9. Sub-channels may encode values ("event objects") that do not decode to any drawing operation to the target slice buffer, but which instead either immediately upon being encountered ("decompression-time event objects") or when they would be "placed" or "scheduled" if they were regular objects ("placement-time event objects"), just execute some code, usually updating some variable being used during the decompression process or at game runtime. (The thing that prevents you from scrolling the screen past the end of map data, is a screen-scroll-lock event object dropped at just the right position that it comes into effect right before the map would run out of tiles to draw. The thing that determines where a "warp-enabled pipe" will take you, is a warp-pipe-targeting event object that applies to all warp-enabled pipes will take you after it runs, until the next warp-pipe-targeting event object is encountered.)

If at least some of these sub-channels are starting to sound like essentially a bytecode ISA for some kind of abstract machine — yes, exactly. Things like "event objects" and "brush invocations" can be more easily understood as opcodes (sometimes with immediates!); and the "modal variables" as the registers of these instruction streams' abstract machines.

[continued...]

1 comment

derefr

derefr 5 hours ago

10. The interesting thing about these instruction streams, though, is that they're all being driven in lockstep externally by the decompressor. None of the level-data ISAs contain anything like a backward JMP-like opcode, because each level-data sub-channel's bytecode interpreter has a finite timeslice to execute per decompression timestep, so allowing back-edges [and so loops] would make the level designers into the engine developers' worst enemy. But most of the ISAs do contain forward JMPs, to essentially encode things like "no objects until [N] [columns/pages] from now." (And a backward JMP instruction does exist in the music-data parameterized-synthesis sub-channel ISA [which as it happens isn't interpreted by the CPU, but is rather the native ISA of the NES's Audio Processing Unit.] If you ever wondered how music keeps not only playing but looping even if the game crashes, it's because the music program is loaded and running on the APU and just happily executing its own loop instructions forever, waiting for the CPU to come interrupt it!)

11. These sub-channel ISAs are themselves designed to be as space-efficient as possible while still being able to be directly executed without any kind of pre-transformation. They're often variable-length, with most instructions being single-byte. Opcodes are hand-placed into the same kind of bit-level Huffman trie you'd expect a DEFLATE-like algorithm to design if it were tasked with compressing a large corpus of fixed-length bytecode. Very common instructions (e.g. a brush to draw a horizontal line of a particular metatile across a page up to the page boundary) might be assigned a very short prefix code (e.g. `11`), allowing the other six bits in that instruction byte to to select a metatile to paint with from a per-tilemap metatile palette table. Rarer instructions, meanwhile, might take 2 bytes to express, because they need to "get out of the way of" all the common prefixes. (You could think of these opcodes as being filed under a chain of "Misc -> Misc -> Etc -> ..." prefixes.)

IMHO, these are all (so far) things that could be studied as generalizable data-compression techniques.

But here are two more techniques that are much more specific to game-dev, where you can change and constrain the data (i.e. redesign the level!) to fit the compressor:

12. Certain ISAs have opcodes that decode to entirely-distinct instructions, depending on the current states of some modal variables! (My guess is that this came about either due to more level features being added late in development after the ISAs has mostly been finalized; or due to wanting to further optimize data size and so seeing an opportunity to "collapse" certain instructions together.) This mostly applies to "brush" opcodes. The actual brush logic they invoke can depend on what the decoder currently sees as the value of the "level type" variable. In one level type, opcode X is an Nx[floor distance] hill; while in another level type, opcode X is a whale, complete with water spout! (In theory, they could have had an opcode to switch level type mid-level. Nothing in this part of the design would have prevented that; it is instead only impractical for other reasons that are out-of-scope here, to do with graphics memory / tileset loading.)

13. And, even weirder: certain opcodes decode to entirely-distinct instructions depending on the current value of the 'page' or 'column' register, or even the precise "instruction pointer" register (i.e. the current 'row' within the 'column'). In other words, if you picture yourself using a level editor tool, and dragging some particular object/brush type across the screen, then it might either "snap" to / only allow placement upon metatiles where the top-left metatile of the object lands on a metatile at a position that is e.g. X%4==1 within its page; or it might "rotate" the thing being dragged between being one of four different objects as you slide it across the different X positions of the metatile grid. (This one's my favorite, because you can see the fingerprint of it in much of the level design of the game. For example: the end of every stage returns the floor height to 2, so that "ground level" is at Y=13. Why? Because flagpole and castle objects are only flagpole and castle objects when placed at Y=13!)