← Back to context

Comment by ajuc

5 years ago

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

That's only 1 part of the complexity equation.

When you have 100 lines in 1 function you know exactly the order in which each line will happen and under which conditions by just looking at it.

If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.

Short functions are good when they fit the problem. Often they don't.

I feel like this is only a problem if the small functions share a lot of global state. If each one acts upon its arguments and returns values without side effects, ordering is much less of an issue IMO.

  • Well, if they were one function before they probably share some state.

    Clean code recommends turning that function into a class and promoting the shared state from local variables into fields. After such a "refactoring" you get a nice puzzle trying to understand what exactly happens.

    • I've seen threads on this before but the "goto" (couldn' t stop myself) reaching of object oriented-ness to "solve" everything is really frustrating.

      I've found the single greatest contributor to more readable and maintainable code is to limit state as much as possible.

      Which was really hard for me to learn because it can be somewhat less efficient, and my game programmer upbringing hates it.

      2 replies →

    • > if they were one function before they probably share some state

      and this is exactly why you refactor to pull out the shared state into parameters, so that each of the "subfunctions" have zero side effects.

    • In javascript I sometimes break up the behaviour of a large function by putting small internal functions inside it. Those internal functions often have side effects, mutating the state of the outer function which contains them.

      I find this approach a decent balance between having lots of small functions and having one big function. The result is self contained (like a function). It has the API of a function, and it can be read top to bottom. But you still get many of the readability benefits of small functions - like each of the internal methods can be named, and they’re simple and each one captures a specific thought / action.

      5 replies →

    • Now this is usually in my opinion not a good advice (it is like reintroduction of global variables) as unnecessary state certainly makes things more difficult to reason about.

      I have read the book (not very recently) and I do not recall this but perhaps I am just immune to such advice.

      I like his book about refactoring more than Clean Code but it introduced me to some good principles like SOLID (a good mnemonic), so I found it somewhat useful.

  • Yes and no.

    What I find is that function boundaries have a bunch of hidden assumptions we don't think about.

    Especially things like exceptions.

    For all these utility functions are you going to check input variables, which means doing it over, over and over again. Catching exceptions everywhere etc?

    A function can be used for a 'narrow use case' - but - when it's actually made available to other parts of the system, it needs to be kind of more generalized.

    This is the problem.

    Is it possible that 'nested functions' could provide a solution? As in, you only call the function once, in the context of some other function, so why not physically put it there?

    I can have it's own stack, be tested separately if needed, but it remains exclusive to the context that it is in from a readability perspective - and you don't risk having it used for 'other things'.

    You could even have an editor 'collapse' the function into a single line of code, to make the longer algorithm more readable.

    • The problem is abstraction isn't free. Sometimes it frees up your brain from unnecessary details and sometimes the implementation matters or the abstraction leaks.

      Even something as simple as Substring which is a method we use all the time and is far more clear than most helper functions I've seen in code bases.

      Is it Substring(string, index, length) or Substring(string, indexStart, indexEnd)

      What happens when you pass in "abc".Substring(0,4) do you get an exception or "abc"?

      What does Substring(0,-1) do? or Substring (-2,-3).

      What happens when you call it on null? Sometimes this matters, sometimes it doesn't.

      1 reply →

    • I posted this elsewhere in the thread, but local blocks that define which variables they read, mutate and export would IMO be a very good solution to this problem:

          (reads: var_1, var_2; mutates: var_3) {
             var_3 = var_1 + var_2
             int result_value = var_1 * var_2
          } (exports: result_value)
      
          return result_value * 5
      

      There are a couple of newer languages experimenting with concepts like this, Jai being one: https://youtu.be/5Nc68IdNKdg?t=3493

      3 replies →

>If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.

It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example. Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?

You're missing likely missing one or more techniques that make this work well:

1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.

2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.

3. Similar levels of abstraction in each function (e.g. not having both a for loop, several if statements based on variables defined in the funtion, and 3 method calls, instead having 4-5 method calls doing the same thing).

4. Explicit pre/post conditions in each method are called out due to the passing in of parameters and the return values. This more effectively helps a reader understand the lifecycle of relevant variables etc.

In your example of 100 lines, the counterpoint is that now I have a method that has at least 100 ways it could work / fail. By breaking that up, I have the ability to reason about each use case / failure mode.

  • > It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example.

    One of the codebases I'm currently working is a big example of that. I obviously can't share parts of it, but I'll say that I agree with GP. Lots of tiny functions kills readability.

    > 1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.

    Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.

    > 2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.

    That's table stakes. Unfortunately, quite often a properly descriptive name would be 100+ characters long, which obviously nobody does.

    > 3. Similar levels of abstraction in each function

    That's a given, but in a way, each "layer" of such functions introduces its own sublevel of abstraction, so this leads to abstraction proliferation. Sometimes those abstractions are necessary, but I found it easier when I can handle them through few "deep" (as Ousterhout calls it) functions than a lot of "shallow" ones.

    > 4. Explicit pre/post conditions in each method

    These introduce a lot of redundant code, just so that the function can ensure a consistent state for itself. It's such a big overhead that, in practice, people skip those checks, and rely on everyone remembering that these functions are "internal" and had their preconditions already checked. Meanwhile, a bigger, multi-step function can check those preconditions once.

    • > Lots of tiny functions kills readability.

      I've heard this argument a lot, and I've found generally there's another problem that causes lack of readability than the small functions.

      >Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.

      Here though you're kind of used to reading code upwards though, so flip the depth first and make it depth last (or take the hit on the forward declarations. If you've got more than you can handle of these, your classes are probably too complex regardless (i.e. doing input, parsing, transformation, and output in the same method).

      > quite often a properly descriptive name would be 100+ characters long

      Generally if this is the case, then the containing class / module / block / ? is too big. Not a problem of small methods, problem is at a higher level.

      > Explicit pre/post conditions in each method

      I should have been more explicit here - what I meant is that you know that in the first method, that only the first 3 variables matter, and those variables / parameters are not modified / relevant to the rest of the method. Even without specifically coding pre/post-cons, you get a better feel for the intended isolation of each block. You fall into a pattern of writing code that is simple to reason about. Paired with pure methods / immutable variables, this tends to (IMO) generate easily scannable code. Code that looks like it does what it does, rather than code that requires reading every line to understand.

  • > You're missing likely missing one or more techniques that make this work well:

    I know how to do it, I just don't always think it's worth it.

    > Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?

    Not 100 lines, just 34, but it's a good example of a function I wouldn't split even if it get to 300 lines.

        function getFullParameters() {
            const result = {
                "gridType": { defaultValue: 1, randomFn: null, redraw: onlyOneRedraw("grid"), },
                "gridSize": { defaultValue: 32, randomFn: null, redraw: onlyOneRedraw("grid"), },
                "gridOpacity": { defaultValue: 40, randomFn: null, redraw: onlyOneRedraw("grid"), },
                "width": { defaultValue: 1024, randomFn: null, redraw: allRedraws(), },
                "height": { defaultValue: 1024, randomFn: null, redraw: allRedraws(), },
                "seed": { defaultValue: 1, randomFn: () => Math.round(Math.random() * 65536), redraw: allRedraws(), },
                "treeDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 100), redraw: onlyOneRedraw("trees"), },
                "stoneDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 20 * Math.random() * 5), redraw: onlyOneRedraw("stones"), },
                "twigsDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 20 * Math.random() * 5), redraw: onlyOneRedraw("twigs"), },
                "riverSize": { defaultValue: 3, randomFn: () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, redraw: onlyRedrawsAfter("river"), },
                "roadSize": { defaultValue: 0, randomFn: () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, redraw: onlyRedrawsAfter("river"), },
                "centerRandomness": { defaultValue: 20, randomFn: () => Math.round(30), redraw: onlyOneRedraw("trees"), },
                "leavedTreeProportion": { defaultValue: 95, randomFn: () => Math.round(Math.random() * 100), redraw: onlyOneRedraw("trees"), },
                "treeSize": { defaultValue: 50, randomFn: () => Math.round(30) + Math.round(Math.random() * 40), redraw: onlyOneRedraw("trees"), },
                "treeColor": { defaultValue: 120, randomFn: () => Math.round(Math.random() * 65536), redraw: onlyOneRedraw("trees"), },
                "treeSeparation": { defaultValue: 40, randomFn: () => Math.round(80 + Math.random() * 20), redraw: onlyOneRedraw("trees"), },
                "serrationAmplitude": { defaultValue: 130, randomFn: () => Math.round(80 + Math.random() * 40), redraw: onlyOneRedraw("trees"), },
                "serrationFrequency": { defaultValue: 30, randomFn: () => Math.round(80 + Math.random() * 40), redraw: onlyOneRedraw("trees"), },
                "serrationRandomness": { defaultValue: 250, randomFn: () => Math.round(100), redraw: onlyOneRedraw("trees"), },
                "colorRandomness": { defaultValue: 30, randomFn: () => Math.round(20), redraw: onlyOneRedraw("trees"), },
                "clearings": { defaultValue: 9, randomFn: () => Math.round(3 + Math.random() * 10), redraw: onlyRedrawsAfter("clearings"), },
                "clearingSize": { defaultValue: 30, randomFn: () => Math.round(30 + Math.random() * 20), redraw: onlyRedrawsAfter("clearings"), },
                "treeSteps": { defaultValue: 2, randomFn: () => Math.round(3 + Math.random() * 2), redraw: onlyOneRedraw("trees"), },
                "backgroundNo": { defaultValue: 1, randomFn: null, redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
                "showColliders": { defaultValue: 0, randomFn: null, redraw: onlyOneRedraw("colliders"), },
                "grassLength": { defaultValue: 85, randomFn: () => Math.round(25 + Math.random() * 50), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
                "grassDensity": { defaultValue: 120, randomFn: () => Math.round(25 + Math.random() * 50), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
                "grassSpread": { defaultValue: 45, randomFn: () => Math.round(5 + Math.random() * 25), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
                "autoredraw": { defaultValue: true, randomFn: null, redraw: noneRedraws(), },
            };
            return result;
        }
    

    There's a lot of value in having all of this in one place. Ordering isn't a problem here, just no need to refactor.

    • I have seen so much GUI code like this in my career! Real world sophisticated GUIs can have tens or hundreds of attributes to setup. Especially ancient Xlib stuff, this was the norm. You have a few functions with maybe hundreds of lines doing pure GUI setup. No problem -- easy to mentally compartmentalise.

      Your deeper point (if I may theorise): Stop following hard-and-fast rules. Instead, do what makes sense and is easy to read and maintain.

    • > I know how to do it, I just don't always think it's worth it.

      Agreed:)

      Generally no problem with this method other than it's difficult to at a glance see what each item will get set to. Something like the following might be an easy first step:

          function getFullParameters() {
            function param(defaultValue, redraws) {
              return { defaultValue: defaultValue, randomFn: null, redraws };
            }
            function param(defaultValue, randomFn, redraws) {
              return { defaultValue: defaultValue, randomFn: randomFn, redraws };
            }
            const result = {
              "gridType": param(1, onlyOneRedraw("grid")),
              "gridSize": param(32, onlyOneRedraw("grid")),
              "gridOpacity": param(40, onlyOneRedraw("grid")),
              "width": param(1024, allRedraws()),
              "height": param(1024, allRedraws()),
              "seed": param(1, () => Math.round(Math.random() * 65536), allRedraws()),
              "treeDensity": param(40, () => Math.round(Math.random() * 100), onlyOneRedraw("trees")),
              "stoneDensity": param(40, () => Math.round(Math.random() * 20 * Math.random() * 5), onlyOneRedraw("stones")),
              "twigsDensity": param(40, () => Math.round(Math.random() * 20 * Math.random() * 5), onlyOneRedraw("twigs")),
              "riverSize": param(3, () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, onlyRedrawsAfter("river")),
              "roadSize": param(0, () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, onlyRedrawsAfter("river")),
              "centerRandomness": param(20, () => Math.round(30), onlyOneRedraw("trees")),
              "leavedTreeProportion": param(95, () => Math.round(Math.random() * 100), onlyOneRedraw("trees")),
              "treeSize": param(50, () => Math.round(30) + Math.round(Math.random() * 40), onlyOneRedraw("trees")),
              "treeColor": param(120, () => Math.round(Math.random() * 65536), onlyOneRedraw("trees")),
              "treeSeparation": param(40, () => Math.round(80 + Math.random() * 20), onlyOneRedraw("trees")),
              "serrationAmplitude": param(130, () => Math.round(80 + Math.random() * 40), onlyOneRedraw("trees")),
              "serrationFrequency": param(30, () => Math.round(80 + Math.random() * 40), onlyOneRedraw("trees")),
              "serrationRandomness": param(250, () => Math.round(100), onlyOneRedraw("trees")),
              "colorRandomness": param(30, () => Math.round(20), onlyOneRedraw("trees")),
              "clearings": param(9, () => Math.round(3 + Math.random() * 10), onlyRedrawsAfter("clearings")),
              "clearingSize": param(30, () => Math.round(30 + Math.random() * 20), onlyRedrawsAfter("clearings")),
              "treeSteps": param(2, () => Math.round(3 + Math.random() * 2), onlyOneRedraw("trees")),
              "backgroundNo": param(1, onlyTheseRedraws(["background", "backgroundCover"])),
              "showColliders": param(0, onlyOneRedraw("colliders")),
              "grassLength": param(85, () => Math.round(25 + Math.random() * 50), onlyTheseRedraws(["background", "backgroundCover"])),
              "grassDensity": param(120, () => Math.round(25 + Math.random() * 50), onlyTheseRedraws(["background", "backgroundCover"])),
              "grassSpread": param(45, () => Math.round(5 + Math.random() * 25), onlyTheseRedraws(["background", "backgroundCover"])),
              "autoredraw": { defaultValue: true, randomFn: null, redraw: noneRedraws(), },
            };
            return result;
          }
      

      For someone looking at this for the first time, the rationale for each random function choice is obtuse so you might consider pulling out each type of random function into something descriptive like randomIntUpto(65536), randomDensity(20, 5), randomIntRange(30, 70).

      Does it add value? Maybe - ask a junior to review the two and see which they prefer maintaining. Regardless, this code mostly exists at a single level of abstraction, which tends to imply simple refactorings rather than complex.

      My guess is if this extended to multiple (levels / maps / ?) you'd probably split the settings into multiple functions, one per map right...?

      1 reply →

I am surprised that this is the top answer (Edit: at the moment, was)

How does splitting code into multiple functions suddenly change the order of the code?

I would expect that these functions would be still called in a very specific order.

And sometimes it does not even make sense to keep this order.

But here is a little example (in a made up pseudo code):

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    positiveInt total = 0
    positiveInt max = 0
    for (positiveInti=0; i < values.length; i++) 
      total = total + values[i]
      max = values[i] > max ? values[i] : max
    return total - max

===>

  function positiveInt max(positiveInt[] values)
    positiveInt max = 0
    for (positiveInt i=0; i < values.length; i++) 
      max = values[i] > max ? values[i] : max
    return max

  function positiveInt total(positiveInt[] values)
    positiveInt total = 0
    for (positiveInt i=0; i < values.length; i++) 
      total = total + values[i]
    return total

  function positiveInt calcMeaningOfLife(positiveInt[] values)
    return total(values)-max(values)

Better? No?

  • > How does splitting code into multiple functions suddenly change the order of the code?

    Regardless of how smart your compiler is and all the tricks it pulls to execute the codein much the same order, the order in which humans read the pseudo code is changed

      01. function positiveInt max(positiveInt[] values)
      02.   positiveInt max = 0
      03.   for (positiveInt i=0; i < values.length; i++) 
      04.     max = values[i] > max ? values[i] : max
      05.   return max
    
      07. function positiveInt total(positiveInt[] values)
      08.   positiveInt total = 0
      09.   for (positiveInt i=0; i < values.length; i++) 
      10.     total = total + values[i]
      11.   return total
    
      12. function positiveInt calcMeaningOfLife(positiveInt[] values)
      13.   return total(values) - max(values)
    
    

    Your modern compiler will take care of order in which the code is executed, but as humans need to trace the code line-by-line as [13, 12, 01, 02, 03, 04, 05, 07, 08, 09, 10, 11]. By comparison, the inline case can be understood sequentially by reading lines 01 to 07 in order.

      01. function positiveInt calcMeaningOfLife(positiveInt[] values)
      02.   positiveInt total = 0
      03.   positiveInt max = 0
      04.   for (positiveInt i=0; i < values.length; i++) 
      05.     total = total + values[i]
      06.     max = values[i] > max ? values[i] : max
      07.   return total - max
    

    > Better? No?

    In most cases, yeah probably your better off with the two helper functions. max() and total() are common enough operations, and they are named well enough that we can easily guess their intent without having to read the function body.

    However, depending on the size of the codebase, the complexity of the surrounding functions and the location of the two helper functions it's easy to see that this might not always be the case.

    If you want to try and understand the code for the first time, or if you are trying to trace down some complex bug there's a chance having all the code inline would help you.

    Further, splitting up a large inline function is more trivial than reassembling many small functions (hope you got your unit tests!).

    > And sometimes it does not even make sense to keep this order.

    Agreed. But naming and abstractions are not trival problems. Often times it's the larger/more complex codebases, where you see these practices get applied more dogmatically

    • Well, inlining by the compiler would be expected but we do not only write the code for the machine but also for another human being (that could be yourself at another moment of time of course).

      Splitting the code into smaller functions does not automatically warrant a better design, it is just one heuristic.

      A naive implementation of the principle could perhaps have found a less optimal solution

        function positiveInt max(positiveInt value1, positiveInt value2)
          return value1 > value2 ? value1 : value2
      
        function positiveInt total(positiveInt value1, positiveInt value2)
          return value1 + value2 
      
        function positiveInt calcMeaningOfLife(positiveInt[] values)
          positiveInt total = 0
          positiveInt max = 0
          for (positiveInt i=0; i < values.length; i++)
            total = total(total, values[i])
            max = max(max, values[i])
          return total - max
      

      Now this is a trivial example but we can imagine that instead of max and total we have some more complex calculations or even calls to some external system (a database, API etc).

      When faced with a bug, I would certainly prefer the refactoring in the GP comment than one here (or the initial implementation).

      I think that when inlining feels strictly necessary then there has been problem with boundary definition but I agree that being able to view one single execution path inlined can help to understand the implementation.

      I completely agree that naming and abstractions are perhaps two most complicated problems.

      5 replies →