Comment by richardfey
11 hours ago
They were doing this kind of optical media seek times tests/optimisations for PS1 games, like Crash Bandicoot. You certainly have more and better context than me on this console/game, I just mentioned it in case it wasn't considered.
By the way, could the nonsensical offsets be checksums instead?
Nice reverse engineering work and analysis there!
IIRC the average seek time across optical media is around 120ms, so ideally you want all reads to be linear.
I remember one game I worked on, I spent months optimising loading, especially boot flow, to ensure that every file the game was going to load was the very next file on the disk, or else the next file was an optionally loaded file that could be skipped (as reading and ignoring was quicker than seeking). For the few non-deterministic cases where order couldn't be predicted (e.g. music loaded from a different thread), I preloaded a bunch of assets up front so that the rest of the assets were deterministic.
One fun thing we often did around this era is eschew filenames and instead hash the name. If we were loading a file directly from C code, we'd use the preprocessor the hash the code via some complicated macros, so the final call would be compiled like LoadAsset(0x184e49da) but still retain a run-time hasher for cases where the filename was generated dynamically. This seems like a weird optimisation, but actually avoiding the directory scan and filename comparisons can save a lot of unnecessary seeking / CPU operations, especially for multi-level directories. The "file table" then just became a list of disk offset and lengths, with a few gaps because the hash table size was a little bigger than the number of files to avoid hash conflicts. Ironically, on one title I worked on we had the same modulo for about 2 years in development, and just before launch we needed to change it twice in a week due to conflicts!
This reminds me of Mel:
https://users.cs.utah.edu/~elb/folklore/mel.html
Thank you.
>By the way, could the nonsensical offsets be checksums instead?
If you're referring to those weird "addresses" that quickly became irrelevant, there's a CRC32 somewhere in the header immediately after them. The address value is the same across files with different contents too.
I was talking to a friend of mine about it and he suggested that maybe whatever process generated the files included the file's load address in case it could be mapped to the same address for some other optimization?