Comment by daneel_w
3 years ago
My source is my experience from having worked with MariaDB and MySQL for about 15 years now. The two articles you cited are 10 and 13 years old, so let's run a fresh test with two identical test tables that have their columns indexed in opposite order of eachother:
CREATE TABLE a/b
(
x int,
y int,
z int,
)
CREATE INDEX a ON a (x, y, z)
CREATE INDEX b ON b (z, y, x) # index in reverse order
Some pseudocode to illustrate how I filled the tables:
for x = 1 to 10:
for y = 1 to 1000:
for z = 1 to 10000:
INSERT INTO a/b VALUES (x, y, z)
Table a took 25 minutes and 2 seconds to populate with 100 million rows.
Table b took 37 hours and 40 minutes to populate. Something's up. I built table a a second time, again taking pretty much the same 25 minutes, and I tried building table b again as well but gave up after four hours. It's definitely not the computer acting up, nor the storage.
+--------+------------+
| Table | Checksum |
+--------+------------+
| test.a | 4243132544 |
| test.b | 4243132544 |
+--------+------------+
Not the data either. Both tables checksum identically. Here's what the storage looks like:
-rw-rw---- 1 mysql mysql 6618611712 Jan 15 15:42 a.ibd
-rw-rw---- 1 mysql mysql 8715763712 Jan 16 17:42 b.ibd
+------------+-----------+-----------+
| table_name | data_mb | index_mb |
+------------+-----------+-----------+
| a | 3478.0000 | 2132.0000 |
| b | 3769.0000 | 4290.7344 |
+------------+-----------+-----------+
So the index of table b is obviously a horrendous crow's nest. It's twice as large as the index of table a, and it's even occupying more space than the actual rows themselves. I ran 100k reproducible queries on each of the tables to see how the composite indices perform. More wonky pseudocode:
srand ( 1234567 )
for i = 1 to 100000:
# this includes 0 which is intentional
xr = prng ( 0 to 10 )
yr = prng ( 0 to 1000 )
zr = prng ( 0 to 10000 )
SELECT * FROM a/b WHERE x = xr AND y = yr AND z = zr
table a
=======
42 seconds
48 seconds
35 seconds
table b
=======
87 seconds
71 seconds
69 seconds
Some caching is of course involved. I queried the tables one after another just to see if rattling the cage a bit would have any effect:
table a: 48 seconds
table b: 83 seconds
table a: 61 seconds
table b: 72 seconds
A little bit, but nowhere near enough to outperform the lean and correctly built index on table a. And this is just a simple three-column composite index - the problem I'm illustrating grows exponentially with every column you add to the composite.
Interesting, I’ll have to dig more into this. Thanks for the super detailed response!