← Back to context

Comment by oraguy

6 years ago

> Tests that run for 30 hours is an indication that nobody bothered writing unittests.

Yes, they were not unit tests. There was no culture of unit tests in the Oracle Database development team. A few people called it "unit tests" but they either said it loosely or they were mistaken.

Unit test would not have been effective because every area of the code was deeply entangled with everything else. They did have the concept of layered code (like a virtual operting system layer at the bottom, a memory management layer on top of that, a querying engine on top of that, and so on) but over the years, people violated layers and wrote code that called an upper layer from lower layer leading to a big spaghetti mess. A change in one module could cause a very unrelated module to fail in mysterious ways.

Every test was almost always an integration test. Every test case restarted the database, connected to the database, created tables in it, inserted test data into it, ran queries, and compared the results to ensure that the observed results match the expected results. They tried to exercise every function and every branch condition in this manner with different test cases. The code coverage was remarkable though. Some areas of the code had more than 95% test coverage while some other areas had 80% or so coverage. But the work was not fun.

I'm amazed that it had that many tests that took that long, but ONLY had 80-95% coverage. I understand the product is huge, but that's crazy.

  • I do. It's about state coverage: every Boolean flag doubles the possible state of that bit of code: now you need to run everything twice to retain the coverage.

    FWIW, I know people who work on SQL processing (big data Hive/Spark, not RDMBS), and a recurrent issue is that an optimisation which benefits most people turns out to be pathologically bad for some queries for some users. Usually those with multiple tables with 8192 columns and some join which takes 4h at the best of times, now takes 6h and so the overnight reports aren't ready in time. And locks held in the process are now blocking some other app which really matters to the businesses existence. These are trouble because they still "work" in the pure 'same outputs as before', it's just the side effects can be be so disruptive.

  • Writing tests for error handling can be a pain. You write your product code to be as robust as possible but it isn't always clear how to trigger the error conditions you can detect. This is especially true with integration tests.