Comment by kingcharles

4 years ago

I would say this sums it up:

> the SQLite library consists of approximately 143.4 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 640 times as much test code and test scripts - 91911.0 KSLOC.

How does SQLite compare for bugs in comparison to other DBs? How much are the other players paying out in bug bounties?

I remember reading someone here explain the extensive number of tests the oracle database has and how implementing a feature is more of a waiting for tests to fail game rather than writing code. I couldn't find it now, maybe I'm misremembering.

Especially after reading the MySQL post yesterday[0], I don't know what to expect anymore.

Does anyone have anything to say about the testing story of Postgres?

[0]: https://news.ycombinator.com/item?id=18442941

90 million lines of test code?

That seems ridiculous to me. I can’t imagine what would lead to so many lines, but I strongly doubt it’s all actually source code.

  • A lot of the mentioned tools are fuzzers[1], these are tools that start from certain input(s), say "SELECT * FROM TABLE WHERE some_column = some_value", and repeatedly

    1- mutates[2] the input,

    2- measures some metric of the database's performance (e.g. how many non-syntax errors it spat out)

    3- and introduces more or less mutations to increase or decrease that metric.

    In short, an evolutionary computation where the population being generated is test code, the fitness function is some metric summarizing how the generated test code tested the db code, and mutation operators gradually pushes the generated test code towards a certain optima that we want our tests to have.

    This is an extremely general approach that can be used to generate anything, and it's uncannily effective at finding bugs especially in structured-format consumers like compilers[3] and db engines. You can use it to generate queries as above and the fitness function would be something like "Did the database gave an unholy error squeak or violate some key invariants in the db file?", you can use it to generate general test harnesses that exercise the whole C code and measure coverage via instrumentation, the fitness function would be something like "How much did this mutation covered of the application's instructions?". Not only do they take both of those approaches and many others, but they do it multuple times and independently, so for example it's mentioned that Google and the core development team both have an independent query fuzzer. They mention about 4 or 5 fuzzers doing similar thing.

    And off course fuzzing is just one way to generate tests, there are plenty others.

    >I strongly doubt it’s all actually source code

    I mean, why not? source code is not necessarily hand-written source code.

    [1]: skim https://www.fuzzingbook.org/ for a fairly good overview

    [2]: Generally, mutation is either completely blind and general, i.e. byte-level , or structure-aware, i.e. has some notion of a grammar or a schema governing the thing it mutates, it wouldn't just change a random letter of "SELECT", because the result would almost cerainly be invalid SQL.

    [3]: https://embed.cs.utah.edu/csmith/