← Back to context

Comment by londons_explore

4 days ago

The filesystem doesn't have access to the right existing ECC data to be able to add a few bytes to do the job. It would need to store a whole extra copy.

There are potentially ways a filesystem could use heirarchical ECC to just store a small percentage extra, but it would be far from theoretically optimal and rely on the fact just a few logical blocks of the drive become unreadable, and those logical blocks aren't correlated in write time (which I imagine isn't true for most ssd firmware).

CD storage has an interesting take, the available sector size varies by use, i.e. audio or MPEG1 video (VideoCD) at 2352 data octets per sector (with two media level ECCs), actual data at 2048 octets per sector where the extra EDC/ECC can be exposed by reading "raw". I learned this the hard way with VideoPack's malformed VCD images, I wrote a tool to post-process the images to recreate the correct EDC/ECC per sector. Fun fact, ISO9660 stores file metadata simultaneously in big-endian and little form (AFAIR VP used to fluff that up too).

  • Octets? Don't you mean "bytes"? Or is that word problematic now?

    • I wonder if OP used "octets" because physical pattern in the CD used to represent a byte is a sequence of 17 pits and lands.

      BTW, byte size during the history varied from 4 to 24 bit! Even now, based on interpretation, you can say 16 bit bytes do exist.

      Char type can be 16 bit on some DSP systems.

      I was curious, so I checked. Before this comment, I only knew about 7 bit bytes.

    • Octets is the term used in most international standards instead of the American "byte".

      "Octet" has the advantage that it is not ambiguous. In old computer documentation, from the fifties to the late sixties, a "byte" could have meant any size between 6 bits and 16 bits, the same like "word", which could have meant anything between 8 bits and 64 bits, including values like 12 bits, 18 bits, 36 bits, 60 bits, or even 43 bits.

      Traditionally, computer memory is divided in pages, which are divided in lines, which are divided in words, which are divided in bytes. However the sizes of any of those "units" has varied in very wide ranges in the early computers.

      IBM System/360 has chosen the 8-bit byte, and the dominance of IBM has then forced this now ubiquitous meaning of "byte", but there were many computers before System/360 and many coexisting for some years with the IBM 360 and later mainframes, where byte meant something else.

    • Personally, I prefer the word "bytes", but "octets" is technically more accurate as there are systems that use differently sized bytes. A lot of these are obsolete but there are also current examples, for example in most FPGA that provide SRAM blocks, it's actually arranged as 9, 18 or 36-bit wide with the expectation that you'll use the extra bits for parity or flags of some kind.

    • Not problematic, minor pedantry. With much time spent reading (and occasionally writing) technical documentation it's octets, binary prefixes, and other wanton pedantry where likely to be understood/appreciated or precision is required.

      FTR, ECMA-130 (the CD "yellow book" equivalent standard) is littered with the term "8-bit bytes", so it was certainly a thing then. Precision when simultaneously discussing eight-to-fourteen modulation, and the 17 encoding "bits" that hit the media for each octet as noted in a sibling comment.

      Now, woktets on the other hand...

    • The term octets is pretty common in network protocol RFCs, maybe their vocabulary is biased in the direction of that writing.

Reed Solomon codes, or forward error correction is what you’re discussing. All modern drives do it at low levels anyway.

It would not be hard for a COW file system to use them, but it can easily get out of control paranoia wise. Ideally you’d need them for every bit of data, including metadata.

That said, I did have a computer that randomly bit flipped when writing to storage sometimes (eventually traced it to an iffy power supply), and PAR (a type of reed solomon coding forward error correction library) worked great for getting a working backup off the machine. Every other thing I tried would end up with at least a couple bit flip errors per GB, which make it impossible.