Comment by jcranmer
7 hours ago
The most important resource you'll need is a hex editor that can let you drop at a cursor and see what the value is at the cursor for all the basic datatypes (u8/u16/u32/u64, float, double, at minimum). Something like 010 Editor or ImHex.
If it's a really simple format, since you appear to have the ability to generate arbitrary file contents using the program, you can get some mileage by generating a suite of small contents with few changes between them. I reverse engineered the DSP sphere blueprint format by generating a blueprint with one node, then the same node located elsewhere, then two nodes, then two nodes and one frame between them, etc. But this process is really only possible for the simplest formats; I'd gander that most reverse-engineered file formats are heavily based on decompilation of the deserialization code.
A lot of binary file formats end up being some form of "container" format--essentially, a file contains some form of directory mapping an item ID to a location in the file, and the contents of that is in some other binary format. It's worth first checking if this is the case, and matching against known formats like ZIP or HDF5.
That sounds interesting. But how can you test these internal binary formats? Do I need to extract that somehow?
ImHex will tell you if it's compressed. Do you understand data structures? Floats, all those data types?
I'd suggest looking at a format like msgpack to see what a binary data format could look like: https://msgpack.org/
Then be aware that proprietary formats are going to be a lot more complicated. Or maybe it's just zipped up json data, only way to tell is to start poking around at it.