Comment by tptacek
2 months ago
It's either a file layout or it is not a file layout. If you write an affidavit saying it's "sort of like a file layout", the conclusion will be that it is not one. Now, the Illinois Supreme Court found that it was a file layout (wrongly). But they didn't use any of this kind of message board logic to do it; they pulled up a definition for "file layout" from a technical dictionary (which, ironically, pretty clearly established, even more than this thread does, that schemas aren't file layouts), and then they pulled up a definition of "schema" from Mirriam-Webster, and the definition of "schema" was so abstract it could have matched anything.
If anybody on the Illinois Supreme Court had known what a schema actually was, we'd have won the case. Further, if the definition of "file layout" had been more material to the Chancery case, it would have been in the trial record that it wasn't one.
> Now, the Illinois Supreme Court found that it was a file layout (wrongly). But they didn't use any of this kind of message board logic to do it; they pulled up a definition for "file layout" from a technical dictionary (which, ironically, pretty clearly established, even more than this thread does, that schemas aren't file layouts)
"Wrongly" was exactly what I just spent an hour writing a long comment disputing, with a detailed explanation. Specifically, with a real-world analogy between “a description of the arrangement of the data in a file” and “a description of the arrangement of the clothes in your closet.”
If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk. Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.
Now if you wanted to argue that a schema serves the same purpose as a file layout, ie that it's how a programmer interfaces with the data, and that it impacts workload performance, that would be fair enough. And given that laws are all about intent perhaps that would be relevant. (Or perhaps not. I didn't read about the case yet.)
But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.
> If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk.
That's one thing I'm saying would be sufficient to consider this file layout, yes. I'm not saying it's necessary. Databases can obviously be row-oriented too. Knowing that they don't cluster would also be layout information. As could any number of other things.
> Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.
It doesn't have to include offsets or any of those other things. File layout information could be as simple as "data should be aligned to a page boundary for performance" or "this field must reserve space for up to 16 characters" or even "data from different records should not be stored in an overlapping manner, to allow fast erasure"... I could go on. And notice the wardrobe layout example doesn't have offsets either, but the decision to separate jackets from shirts is absolutely one about layout nonetheless.
> But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.
It is not complete file layout information. But it certainly can be part of the file layout information.
Imagine you had a table with columns name1 VARCHAR(64) and name2 VARCHAR(64) in that order. Now imagine you modified a couple of bytes on the disk, such that you swap the 1 and the 2. You can imagine a database where that would be sufficient to confuse it into thinking the two columns had swapped contents, right? Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?
3 replies →