← Back to context

Comment by segfaultbuserr

5 years ago

There are many metrics for measuring the logic complexity of code. An early and best-known one is Cyclomatic Complexity [0] - The more possible execution paths, the higher the score. You can find more at Software Metric [1].

[0] https://en.wikipedia.org/wiki/Cyclomatic_complexity

[1] https://en.wikipedia.org/wiki/Software_metric

If there is a useful metric, then I haven't found it yet. The data on cyclomatic complexity[1] does not convince me, and in practice, the linters I've seen have complained about boring code like this:

    switch (foo) {
    case KIND_1: return parseObjectOfKind1();
    case KIND_2: return parseObjectOfKind2();
    case KIND_3: return parseObjectOfKind3();
    ...
    case KIND_15: return parseObjectOfKind5();
    case KIND_16: return parseObjectOfKind16();
    }

There are 16 paths and yet this code is easy to follow. There is no substitute for human judgement (code reviews).

[1] https://en.wikipedia.org/wiki/Cyclomatic_complexity#Correlat...

  • > case KIND_15: return parseObjectOfKind5();

    I don't like this type of code for exactly this reason.

    • I mean, hilarious point, but I'm not sure it's valid. Clearly he's numbering things here, but I am guessing in the real world you're far likelier to see tests of typing or some not-numerically-indexed value rather than literally numbered ENUMs

      2 replies →

  • It kind of encourages you to use a lookup table or polymorphism, which isn't far off from what the compiler is likely to do with that switch statement.

    As for correlation to number of defects, another important factor is maintainability (velocity of new features in growing programs, keeping up with infrastructure churn in mature programs). If reducing perceived complexity improves maintainability, and if measuring cyclomatic complexity despite its flaws helps reduce perceived complexity, then it's still a useful metric.

    • Feels like one of those classic, "no one tool gives you the one correct answer. They all give you data and you come to an informed conclusion having utilized one or many of them."

    • > It kind of encourages you to use a lookup table or polymorphism

      And I think that is a problem. switch()ing over an enum will give you a warning about unhandled cases in some languages. But if you start building your own lookup tables, you will be just as prone to typos like mine, except with more code, and less static analysis.

      Yet the cyclomatic complexity drops from 16 to 1. I don't think that's a healthy incentive.

      2 replies →

  • That's what I was thinking about when I used "ideas". To me there's one main idea in all that code: "we are returning a parsed object of a kind"

    Not that I'm trying to sell "ideas". I don't even know. But it's this very loose concept that floats around my mind when writing code. How many ideas are there in a stanza? Ahh too many. I should break out one of the larger ideas to happen before.

    • Speaking of measuring "ideas", the holy grail of complexity metric is Kolmogorov complexity - theoretically, the inherent complexity of a piece of data (including code) can be defined as the shortest possible program that generates it. For example, 20 million digits of pi or a routine that uses 20 branches to return the same object has low Kolmogorov complexity, because it's easy to write a generator for that, meanwhile a parser is more complex.

      But it's only a mathematical construction and is uncomputable, just like the halting problem. In real life, for some applications a good compression algorithm like LZMA is sufficient to approximate it. But I'm not sure if it's suitable for measuring computer programs - it would still have a strong correlation to the number of lines of code.

  • This is one of the benefits of Cognitive Complexity:

    https://www.sonarsource.com/docs/CognitiveComplexity.pdf

    >Switches

    >A `switch` and all its cases combined incurs a single structural increment.

    >Under Cyclomatic Complexity, a switch is treated as an analog to an `if-else if` chain. That is, each `case` in the `switch` causes an increment because it causes a branch in the mathematical model of the control flow.

    >But from a maintainer’s point of view, a switch - which compares a single variable to an explicitly named set of literal values - is much easier to understand than an `if-else if` chain because the latter may make any number of comparisons, using any number of variables and values.

    >In short, an `if-else if` chain must be read carefully, while a `switch` can often be taken in at a glance.

    • Great paper! I particularly liked how they treat boolean expressions. It kind of has a mathematical justification as well since a series of only && or || is trivial for a SAT solver.

      I disagree with them on assuming method calls being free in terms of complexity. Too much abstraction makes it difficult to follow. I've heard of this being called lasagna code since it has tons of layers (unnecessary layers).

      Maybe the complexity introduced by overabstraction requires other tools to analyze? It's tricky to look at it via cylcomatic complexity or cognitive complexity since it is non-local by nature.

  • Except it's not easy to follow. Why does kind 15 return a parsed object of kind 5? That's not counting the complexity of each function called, either.

    • I think they were using the numbers as stand-ins for what would be in reality names be descriptive names (foo, bar, etc.).

    • Ouch, sorry for the typo. But I'll double down on this: The fact that the typo is so obvious underlines that the structure is easy to review :)

> The more possible execution paths, the higher the score.

Great! Now we can all begin to write branchless code. Make branch prediction miss a thing of the past!

  • You jest, but I had a manager who loved Cyclomatic complexity as a measure of programmer ability. I tried in vain to convince him that CC was measuring the complexity of the underlying business logic, not my programming ability.

    I wasted a lot of brainpower cutting down CC in my code doing terrible things like storing function names as strings in hashmaps, i.e.,

        String fn = codepaths.get(if_statement_value);
        obj.getClass().getDeclaredMethod(fn).invoke(param,...);
    

    Because that could replace several 'if' checks, since function calls weren't branches. Of course, no exception handling, because you could save branches by using throws Exception.

    To this day, I wonder if Cyclomatic complexity obsession helped make "enterprise Java" what it is today.

    • The worst part is that this doesn't even reduce the cyclomatic complexity of the code. You replaced conditional branches with table lookups, indirect function calls, and exceptions, but you still have the same number of paths through the code—or perhaps more with the extra code for the table lookups. In the end you're just hiding the real complexity from the tools.

      Minimizing cyclomatic complexity might actually be a reasonable approach, if the complexity were accurately measured without these blind spots. For example, any indirect calls should be attributed a complexity based on the number of possible targets, and any function that can throw an exception should be counted as having at least two possible return paths (continue normally or branch to handler / re-throw exception) at each location where it's called.

The real measure you want is the cyclomatic complexity divided by the complexity required by the problem. The problem with line counting (or cyclo counting) is that it assumes a constant ratio.

  • It is not trivial to understand the necessary complexity of a problem. In SICP, there's an early practice problem that asks you to turn a recursive number pattern generator into an iterative generator. What the problem doesn't tell you is that the generator obscured by complex recursion is actually creating a tribbonaci number sequence. This hidden insight turns a difficult problem into a trivial problem.

    • I certainly didn't mean to imply it was trivial (or even possible). I'm just pointing out that assuming increasing LoC implies an increase in necessary complexity is just wrong.