Comment by gurkendoktor

5 years ago

If there is a useful metric, then I haven't found it yet. The data on cyclomatic complexity[1] does not convince me, and in practice, the linters I've seen have complained about boring code like this:

    switch (foo) {
    case KIND_1: return parseObjectOfKind1();
    case KIND_2: return parseObjectOfKind2();
    case KIND_3: return parseObjectOfKind3();
    ...
    case KIND_15: return parseObjectOfKind5();
    case KIND_16: return parseObjectOfKind16();
    }

There are 16 paths and yet this code is easy to follow. There is no substitute for human judgement (code reviews).

[1] https://en.wikipedia.org/wiki/Cyclomatic_complexity#Correlat...

28 comments

gurkendoktor

wvenable 5 years ago

> case KIND_15: return parseObjectOfKind5();

I don't like this type of code for exactly this reason.

Twisol 5 years ago
The association between the two has to be written somewhere. Can you provide an alternative approach that isn't vulnerable to typos?
- Jtsummers 5 years ago
  
  Their point seems to be in the mismatch of names. Imagine seeing this sequence, with names instead of numbers:
  switch(HTTP_METHOD) { case PUT: processPut(...); case POST: processPost(...); case GET: processDelete(...); // wtf case DELETE: processGet(...); // mate? }
  To the reader that appears to be an error even if it is precisely the thing you want to happen.
- wvenable 5 years ago
  
  This is basic DRY violation. The number is duplicated twice and that can get out of sync (as in the example).
  I think everyone has coded this way out of expedience and some of us have eventually messed it up too. But you could, for example, use a macro in C to only list the number once. It might not be a trade-off worth making though.
  Personally, I've used reflection to do this kind of mapping of enums to functions in a language that supports reflection.
  
  3 replies →
- Leherenn 5 years ago
  
  Depends on your language, but you could use the type system instead; e.g. the function is chosen by the compiler based on the type of the value.
  In more dynamic languages, you could probably use introspection.
  Lastly, this does not alleviate the association issue, but I prefer the alternative of declaring the associations in an array/map somewhere, and using map[enum_value]() instead of the switch.
  
  5 replies →
nawgz 5 years ago
I mean, hilarious point, but I'm not sure it's valid. Clearly he's numbering things here, but I am guessing in the real world you're far likelier to see tests of typing or some not-numerically-indexed value rather than literally numbered ENUMs
- wvenable 5 years ago
  
  This can happen with any enum to function mapping. You cut and paste and forget to change one or the other.
  In C, you could use a macro in this call to make sure that the name/number is only specified once.
  
  1 reply →

nitrogen 5 years ago

It kind of encourages you to use a lookup table or polymorphism, which isn't far off from what the compiler is likely to do with that switch statement.

As for correlation to number of defects, another important factor is maintainability (velocity of new features in growing programs, keeping up with infrastructure churn in mature programs). If reducing perceived complexity improves maintainability, and if measuring cyclomatic complexity despite its flaws helps reduce perceived complexity, then it's still a useful metric.

Waterluvian 5 years ago

Feels like one of those classic, "no one tool gives you the one correct answer. They all give you data and you come to an informed conclusion having utilized one or many of them."
gurkendoktor 5 years ago
> It kind of encourages you to use a lookup table or polymorphism
And I think that is a problem. switch()ing over an enum will give you a warning about unhandled cases in some languages. But if you start building your own lookup tables, you will be just as prone to typos like mine, except with more code, and less static analysis.
Yet the cyclomatic complexity drops from 16 to 1. I don't think that's a healthy incentive.
- nitrogen 5 years ago
  
  I think that's a fair criticism of a specific failing of a specific complexity metric. A reasonable response, especially if switch statements or pattern matches in your language are better at detecting completeness than other approaches, might be changing the metric to reward switch statements that completely cover the input range. It also makes it clear why any one metric should not be a target with no exceptions. I don't think it invalidates the concept of measuring complexity.
  
  1 reply →

Waterluvian 5 years ago

That's what I was thinking about when I used "ideas". To me there's one main idea in all that code: "we are returning a parsed object of a kind"

Not that I'm trying to sell "ideas". I don't even know. But it's this very loose concept that floats around my mind when writing code. How many ideas are there in a stanza? Ahh too many. I should break out one of the larger ideas to happen before.

segfaultbuserr 5 years ago

Speaking of measuring "ideas", the holy grail of complexity metric is Kolmogorov complexity - theoretically, the inherent complexity of a piece of data (including code) can be defined as the shortest possible program that generates it. For example, 20 million digits of pi or a routine that uses 20 branches to return the same object has low Kolmogorov complexity, because it's easy to write a generator for that, meanwhile a parser is more complex.
But it's only a mathematical construction and is uncomputable, just like the halting problem. In real life, for some applications a good compression algorithm like LZMA is sufficient to approximate it. But I'm not sure if it's suitable for measuring computer programs - it would still have a strong correlation to the number of lines of code.

oftenwrong 5 years ago

This is one of the benefits of Cognitive Complexity:

https://www.sonarsource.com/docs/CognitiveComplexity.pdf

>Switches

>A `switch` and all its cases combined incurs a single structural increment.

>Under Cyclomatic Complexity, a switch is treated as an analog to an `if-else if` chain. That is, each `case` in the `switch` causes an increment because it causes a branch in the mathematical model of the control flow.

>But from a maintainer’s point of view, a switch - which compares a single variable to an explicitly named set of literal values - is much easier to understand than an `if-else if` chain because the latter may make any number of comparisons, using any number of variables and values.

>In short, an `if-else if` chain must be read carefully, while a `switch` can often be taken in at a glance.

slaymaker1907 5 years ago

Great paper! I particularly liked how they treat boolean expressions. It kind of has a mathematical justification as well since a series of only && or || is trivial for a SAT solver.
I disagree with them on assuming method calls being free in terms of complexity. Too much abstraction makes it difficult to follow. I've heard of this being called lasagna code since it has tons of layers (unnecessary layers).
Maybe the complexity introduced by overabstraction requires other tools to analyze? It's tricky to look at it via cylcomatic complexity or cognitive complexity since it is non-local by nature.

colonelpopcorn 5 years ago

Except it's not easy to follow. Why does kind 15 return a parsed object of kind 5? That's not counting the complexity of each function called, either.

johncolanduoni 5 years ago

I think they were using the numbers as stand-ins for what would be in reality names be descriptive names (foo, bar, etc.).
gurkendoktor 5 years ago

Ouch, sorry for the typo. But I'll double down on this: The fact that the typo is so obvious underlines that the structure is easy to review :)