Comment by danShumway

5 years ago

> these things are literally what the Clean Code book advocates for

I'm not sure I understand what you're saying, I might be missing your point. The Clean Code book advocates that the ideal function is a single digit number of lines, double digits at the absolute most.

In my mind, the entire process of writing functions that short involves abstracting almost everything your code does. It involves passing data around all over the place and attaching state to objects that get constructed over multiple methods.

How do you create a low-abstraction, bottom-up codebase when every coroutine you need to write is getting turned into dozens of separate functions? I think this is showcased in the code examples that the article author critiques from Clean Code. They're littered with side effects and state mutations. This stuff looks like it would be a nightmare to maintain, because it's over-abstracted.

Martin is writing one-line functions whose entire purpose is to call exactly one other function passing in a boolean. I don't even know if I would call that top-down programming, it feels like critiquing that kind of code or calling it characteristic of their writing style is almost unfair to top-down programmers.

7 comments

danShumway

tharkun__ 5 years ago

I'm not saying the entire book taken literally is how everything must be done. I was trying to say that the general ideas make sense such as keeping a function at the same level of abstraction and keeping them small.

I agree with you that having all functions be one liners is not useful. Keeping all functions to within just a few lines or double digits at most makes sense however. Single digit could be 9. That's a whole algorithm right there! For example quicksort (quoted from the Wikipedia article)

  algorithm quicksort(A, lo, hi) is
    if lo < hi then
        p := partition(A, lo, hi)
        quicksort(A, lo, p - 1)
        quicksort(A, p + 1, hi)

This totally fits the single digit of lines rule and it describes the algorithm on a high enough level of abstraction that you get the idea of the whole algorithm easily. Do you think that inlining the partition function would make this easier or harder to read?

  algorithm quicksort(A, lo, hi) is
    if lo < hi then
        pivot := A[hi]
        i := lo
        for j := lo to hi do
            if A[j] < pivot then
                swap A[i] with A[j]
                i := i + 1
        swap A[i] with A[hi]

        quicksort(A, lo, i - 1)
        quicksort(A, i + 1, hi)

(I hope I didn't mix up the indentation - on the phone here and it's hard to see lol)

Now some stuff might require 11 or 21 lines. But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.

danShumway 5 years ago
> But as we get closer to 100 lines I doubt that it's more understandable and readable to have it all in one big blob of code.
Well, but that's exactly what I'm pushing back against. I think the rule of 30 is often a mistake. I think if you're going out of your way to avoid long functions, then you are probably over-abstracting your code.
I don't necessarily know that I would inline a quicksort function, because that's genuinely something that I might want to use in multiple places. It's an already-existing, well-understood abstraction. But I would inline a dedicated custom sorting method that's only being used in one place. I would inline something like collision detection, nobody else should be calling that outside of a single update loop. In general, it's a code smell to me if I see a lot of helper functions that only exist to be called once. Those are prime candidates for inlining.
This is kind of a subtle argument. I would recommend http://number-none.com/blow/john_carmack_on_inlined_code.htm... as a starting point for why inlined code makes sense in some situations, although I no longer agree with literally everything in this article, and I think the underlying idea I'm getting at is a bit more general and foundational.
> Do you think that inlining the partition function would make this easier or harder to read?
Undoubtedly easier, although you should label that section with a comment and use a different variable name than `i`. Your secondary function is just a comment around inline logic, it's not doing anything else.[0]
But by separating it out, you've introduced the possibility for someone else in the same class or file to call that function without your knowledge. You've also introduced the possibility for that method to contain a bug that won't be visible unless you step through code. You've also created a function with an unlabeled side effect that's only visible by looking at the implementation, which I thought we were trying to avoid.
You've added a leaky abstraction to your code, a function that isn't just only called in one place, but should only be called in one place. It's a function that will produce unexpected results if anyone other than the `quickSort` method calls it, that lacks any error checking; it's not really a self-contained unit of code at all.
And for what benefit? Is the word `partition` really fully descriptive of what's going on in that method? Does it indicate that the method is going to manipulate part of the array? And is anyone ever going to need to debug or read a quicksort method without looking at the partition method? I think that's very unlikely.
----
Maybe you disagree with everything I'm saying above, but regardless, I don't think that Clean Code is actually advocating for the same ideas as I am:
> Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function.
I don't think that claim is one that Martin would agree with. Or if it is, I don't think it's a statement he's giving actionable advice about inside of his book.
----
[0]: In a language like Javascript (or anything that supports inline functions), we might still use a function or a new context as a descriptive boundary, particularly if we didn't want `j` and `pivot` to leak:
function quicksort(data, lowIndex, highIndex) { if (lowIndex >= highIndex) { return; } const pivotIndex = (function partition (data, lo, hi) { //etc... }(data, lo, hi)); quickSort(data, lowIndex, pivotIndex - 1); quickSort(data, pivotIndex + 1, highIndex); }
But for something this trivially small, I suspect that a simple comment would be easier to read.
function quicksort(data, lowIndex, highIndex) { if (lowIndex < highIndex) { return; } /* Partition */ let pivot = data[hi]; //etc... quicksort(data, lowIndex, partionIndex - 1); quicksort(data, partionIndex + 1, highIndex); }
Remember that your variable and function names can go out of date at the same speed as any of your comments. But the real benefit of inlining this partition function (besides readability, which I'll admit is a bit subjective), is that we've eliminated a potential source of bugs and gotten rid of a leaky abstraction that other functions might be tempted to call into.
- TeMPOraL 5 years ago
  
  > Remember that your variable and function names can go out of date at the same speed as any of your comments.
  A very good point, thank you for voicing it!
  As the luck would have it, two days ago I was writing comments about this at work during code review - there was a case where a bunch of functions taking a "connection" object had it replaced with a "context" object (which encapsulated connection, and some other stuff), but the parameter naming wasn't updated. I was briefly confused by this when studying the code.
  
  2 replies →
- tharkun__ 5 years ago
  
  Glad you admitted subjectivity. I will too and I am on the other side of that subjectivity. For the quicksort example, that was the pseudo code from the Wikipedia article.
  I personally think that the algorithm is easier to grasp conceptually if I just need to know 'it partitions the data and the runs quicksort on both of those partitions. Divide and conquer. Awesome'.
  I don't care at that level of abstraction _how_ the partitioning works. In fact there are multiple different partition functions people have created that have various characteristics. The fact that this changes its parameters is geberally bad if you ask me but in this specific case of a general purpose and high performance sorting function totally acceptable for the sake of speed and memory use considerations. In other 'real world' scenarios of 'simple business software' I would totally forsake that speed and memory efficiency for better abstractions. This is also where Carmack is basically not a good example. His world is that of high performance graphics and game engine programming where he's literally the one dude that has it all in his head. I can totally see why he would have different from someone like me that has to go look at a different piece of code that I've never seen before every day multiple times.
  You mention various problems with this code such as the in place nature and bad naming and such. Most of that is simply the copy from Wikipedia and yes I agree I would also rename these in real code. I do not agree however with the parts about 'someone else could call this now'. To stick with Clean Code's language of choice, the partition function would actually be a private method to the quicksort class. Thus nobody outside can call it but the algorithm itself, as a self contained unit is not just a blob of code.
  Same with your inlining of collision detection and such. I don't think I would do that. I think it has value to know that the overall loop is something like
  do_X() do_Y() detect_collisions() do_Z()
  Overall "game loop" easily visible straight away. The collision detection function might be a private method to that class you're in though. Will depend on real world scenario I would say.
  You also mention you could use a comment. Your comment only does half the job though. It only tells me where the partitioning starts, not where it ends. In this example it's sort of easy to see. As the code we are talking about gets larger it's not as easy any more. So you have to make sure to make a new comment above every 'section'. Problem is that this can be forgotten. Now I need to actually read and fully understand the code to figure out these boundaries. I can no longer just tell my editor to jump over something. I can no longer have the compiler ensure that the boundaries are set (it will ensure proper function definition and calls).
  
  1 reply →