Comment by compsciphd

1 month ago

So I've recovered a lot of damaged DVDs and I think in my research it showed that DVDs also do ECC across larger than the 2048 data blocks (maybe 16 of them?)

So when I used ddrescue, I would read in that block size (instead of just 2048) as if I would get lucky and get a good read (or enough signal that ECC could repair it on the large block).

This was very effective at recovering DVDs with repeated reads vs when I had previously done it with 2048 byte reads only I would end up with 2048 byte reads scattered all over (which if ECC is done on 16x2k 32k byte block size, means there was a lot of data I was leaving on the floor that should have been recovered on those reads).

Ddrescue was also good for this in the sense that if I was trying to recover a DVD (video) from multiple damaged DVDs, as long as they were not damaged in the same location, i was able to fill in the blanks.

Perhaps you can correct me about the 16 block mechanism, perhaps it was just random that it worked and my understanding at the time was wrong.

2 comments

compsciphd

bri3d 1 month ago

You are both correct and the article discusses it accurately:

> Then you have 2048 bytes of user data, scrambled for the reasons mentioned before. The best way to look at the sector as a whole is to think of each sector as 12 “rows” consisting of 172 bytes each. After each 172-byte row is 10 bytes of ECC data called Parity Inner (PI), which is based on Reed-Solomon and applied to both the header and scrambled user data per row within the sector itself. Then, after the user data and parity inner data, is the 4-byte EDC, which is calculated over the unscrambled user data only. Then, finally, Parity Outer (PO) is another form of ECC that is applied by “column” that spans over an entire block of multiple sectors stacked horizontally, or in other words, a group of 16 sectors. Altogether, this adds up to 2366 bytes of recorded sector data.

compsciphd 1 month ago

I see that now, though I wonder if my assumption about reading 32KB aligned at a time, really does improve or not.
PO works on the 32KB block (after PI fixes what it can of the 2KB blocks).
So if PO works, it means that it was able to correct any errors in any blocks in the 32KB block, but it doesn't mean it will be able to do it every time. But my assumption is that if I read 32KB aligned that the hardware operates on the 32KB block once.
But if the hardware only operates on 2KB blocks, so a a 32KB read would be internally treated as 16 2KB reads, just that if a 2KB read fails even with PI, it will try to read 32KB and correct with PO, but then forget everything it just did if it succeeded. Then my assumption of how to do it better fails, as each 2KB block (even within a 32KB aligned read), would still need to be lucky, vs just needing to get lucky once for each 32KB aligned block.
the reason I'm wondering is that the "raw bytes" cache the author demonstrates the drive as having is only 2+KB in size (based on what they are reading) and that makes me wonder about my assumptions.