← Back to context

Comment by cubefox

1 day ago

I'm pretty sure you can change various file formats without rewriting the entire file and without using "incremental updates".

You can’t insert data into the middle of a file (or remove portions from the middle of a file) without either rewriting it completely, or at least rewriting everything after the insertion point; the latter requires holding everything after the insertion point in memory (or writing it out to another file first, then reading it in and writing it out again).

PDF is designed to not require holding the complete file in memory. (PDF viewers can display PDFs larger than available memory, as long as the currently displayed page and associated metadata fits in memory. Similar for editing.)

  • While tedious, you can do the rewrite block-wise from the insertion point and only store a an additional block's worth of the rest (or twice as much as you inserted)

    ABCDE, to insert 1 after C: store D, overwrite D with 1, store E, overwrite E with D, write E.

No, if you are going to change the structure of a structured document that has been saved to disk, your options are:

1) Rewrite the file to disk 2) Append the new data/metadata to the end of the existing file

I suppose you could pre-pad documents with empty blocks and then go modify those in situ by binary editing the file, but that sounds like a nightmare.

  • Aren't there file systems that support data structures which allow editing just part of the data, like linked lists?

    • Yeah there are, Linux supports parameters FALLOC_FL_INSERT_RANGE and FALLOC_FL_COLLAPSE_RANGE for fallocate(2). Like most fancy filesystem features, they are not used by the vast majority of software because it has to run on any filesystem so you'd always need to maintain two implementations (and extensive test cases).

      8 replies →

    • Look at the C file API which most software is based on, it simply doesn’t allow it. Writing at a given file position just overwrites existing content. There is no way to insert or remove bytes in the middle.

      Apart from that, file systems manage storage in larger fixed-size blocks (commonly 4 KB). One block typically links to the next block (if any) of the same file, but that’s about the extent of it.

      1 reply →

    • No. Well yes. On mainframes.

      This is why “table of contents at the end” is such an exceedingly common design choice.

This was 1996. A typical computer had tens of megabytes of memory with throughput a fraction of what we have today. Appending an element instead of reading, parsing, inserting and validating the entire document is a better solution in so many ways. That people doing redactions don't understand the technology is a separate problem. The context matters.