Comment by zzo38computer
5 years ago
They say that Microsoft Word, PowerPoint, etc are "fully custom formats", but actually those formats are "wrapped pile-of-files formats" (and 7-Zip can open them), although the contents of the files inside are custom formats (and they seem to have failed to consider that category).
The application ID number in the SQLite header can be used to identify application file formats. The application ID number is a 32-bit number, and there have been a few different ways to handle it; I have seen the use of hexadecimal and of ASCII; I used base thirty-six, and I have then later seen the suggestion to use RADIX-50. Additionally, there is a document about "defense against dark arts" in case you need to load untrusted files.
TeXnicard uses a SQLite database file (with application ID 1778603844) for the card database file. The version control file (optional, and not fully implemented yet) uses a custom format (which is fully documented), and it does support atomic transactions. It consists of a header followed by a sequence of frames, which are key frames and delta frames. The header of the version control file contains two pointers: one to the beginning of the most recently committed key frame, and the other one to the end of the most recently committed frame (whether a key frame or a delta frame; if all frames are fully committed, this will be equal to the length of the file). These pointers are written only after the rest of the file is written; if it gets interrupted, reads will ignore the partially written data, and further writes will overwrite the partially written data.
ZZ Zero uses a Hamster archive of custom (but documented) formats as its world file format. (A Hamster archive is zero or more "lumps" concatenated together. A lump consists of a null-terminated ASCII filename, 32-bit PDP-endian data size (measured in bytes), and then the data of that lump. The preceding text in these parentheses is the full definition of the Hamster archive format; you can use this to implement your own.)
Free Hero Mesh uses a "pile-of-wrapped-pile-of-files" format. A puzzle set consists of four files: .class (which stores class definitions), .xclass (which stores pictures and sounds to be used by the class definitions), .level (which stores levels), and .solution (which stores solutions). The .class file is a plain text file; the other three are Hamster archives. These are four logically distinct parts of a puzzle set; this allows you to split them apart, to create symlinks to share class definitions with puzzle sets, to substitute your own graphics, to work with multiple solution sets (e.g. per user), etc. If you need to do more than that, then you can of course extract the lumps if needed. For class definitions, you can just copy and paste the text.
MegaZeux used a fully custom format before, but now it uses a ZIP archive with the stuff inside being custom formats (one of which is the "MegaZeux Property List" format, which I have documented in Just Solve The File Format wiki; the authors of MegaZeux did not seem to document this format themself anywhere, so I figured it out and did it by myself).
For some cases, SQLite database is a good application file format; other times, I think other formats (such as text formats) may be better. It depends on the application. XML is too often used for stuff that isn't text markup stuff, and XML is especially bad for stuff that isn't text markup stuff, I think.
If you use SQLite though, you will get more than just the database access. It also gives you the string builder functions, the sqlite3_mprintf function, a page cache implementation, memory usage statistics, and a SQL interpreter; the SQL interpreter can be used as one way to allow user customization and user queries (including batch operations), without having to make an entirely new scripting language to embed.
They mention interfaces of SQLite are available for many other programming languages, although at least one that doesn't seem to have a interface to SQLite is PostScript (although you can use %pipe%, it doesn't work so well especially since it is only a one way pipe), and I am not sure if awk has it either.
.doc, .ppt, &c. were fully custom. .docx, .pptx, &c. are ZIP files. Remember that this was written in 2014, and many people were still using the old formats then, because many people didn’t have Office 2007 or newer or software that could cope with this new format. (Now I think very few people still use the old Office formats.)
No, .doc, .ppt, etc are wrapped file of files, although not using a ZIP container format. I also believe Visual Pinball uses the same container format as Microsoft Word. 7-Zip just calls this format "Compound", although there are other names for it, such as "DocFile" and "POIFS". I just tried now such a file, and I got the following listing from 7-Zip: