Comment by dataflow
2 months ago
> If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk.
That's one thing I'm saying would be sufficient to consider this file layout, yes. I'm not saying it's necessary. Databases can obviously be row-oriented too. Knowing that they don't cluster would also be layout information. As could any number of other things.
> Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.
It doesn't have to include offsets or any of those other things. File layout information could be as simple as "data should be aligned to a page boundary for performance" or "this field must reserve space for up to 16 characters" or even "data from different records should not be stored in an overlapping manner, to allow fast erasure"... I could go on. And notice the wardrobe layout example doesn't have offsets either, but the decision to separate jackets from shirts is absolutely one about layout nonetheless.
> But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.
It is not complete file layout information. But it certainly can be part of the file layout information.
Imagine you had a table with columns name1 VARCHAR(64) and name2 VARCHAR(64) in that order. Now imagine you modified a couple of bytes on the disk, such that you swap the 1 and the 2. You can imagine a database where that would be sufficient to confuse it into thinking the two columns had swapped contents, right? Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?
Note that "some information related to the file layout" or "some information that has an impact on the file layout" is not "the file layout" in a literal sense. Thus it seems to me to follow that the answer to the question "is this a file layout" should be no.
Symbolically it isn't [ schema -> file layout ] it's [ schema, engine version -> file layout ]. Even if you had that additional information, neither item by itself nor even the pair together would be correctly considered a file layout. If I have a function f( foo, bar ) -> baz neither a foo nor a bar is a baz. I can fairly trivially fix a sandwich out of bread, peanut butter, and jam; in no way does that imply that the three ingredients sitting next to each other on the counter are a sandwich.
For that matter, even the [ schema -> file layout ] case isn't technically a file layout any more than a json blob is an xml blob. Being trivially translatable doesn't change the definition.
Compare that with the question (also commonly asked by courts) "is thing equivalent in intent (or use, or ...) to other thing" in which case the answer might feasibly be yes.
> Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?
In that example you have made an educated guess about the file layout and then taken advantage of that (guessed) information. "You can imagine a database" tells you everything you need to know here, namely that this is entirely dependent on the implementation. So yes, I would claim that the schema did not on its own contain any file layout information though in conjunction with knowledge of the implementation it could be used to derive such.
> I can fairly trivially fix a sandwich out of bread, peanut butter, and jam; in no way does that imply that the three ingredients sitting next to each other on the counter are a sandwich.
What is "sandwich" in this analogy? Nobody is claiming the schema is a "database", or a "table". I was saying it's one component of the file layout.
Using your own analogy: if you know you put the jam near the peanut butter, you know part of the ingredient layout. You can't say "it's not ingredient layout if you haven't told me where the bread is."
The point about the sandwich was that the inputs to a function are not correctly referred to as its output. Those are distinct things.
If you wanted to further extend the analogy to apply to schemas then I guess the recipe would be the database engine and the final product that you eat would be the file layout. Knowing that the final dish will include jam does not mean that you have the final dish in your possession. The jam sitting on the counter is not the final dish.
Importantly, you don't even know how I'm going to use the jam. I could put it only on one half, or I could arrange it in stripes, or I could even use more than two pieces of bread! I might not even make a sandwich! I could even throw it all in a blender and make a (disgusting) smoothie.