Comment by danShumway

5 years ago

> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.

Additionally, I have found that function names become outdated at about the same rate as comments do. If the common criticism of code commenting is that "comments are code you don't run", function names also fall into that category.

I don't have a universal rule on this, I think that managing code complexity is highly application-dependent, and dependent on the size of the team looking at the code, and dependent on the age of the code, and dependent on how fast the code is being iterated on and rewritten. However, in many cases I've started to find that it makes sense to inline certain logic, because you get rid of the risk of names going out of date just like code comments, and you remove any ambiguity over what the code actually does. There are some other benefits as well, but they're beyond the scope of the current conversation.

Perfect abstractions are relatively rare, so in instances where abstractions are likely to be very leaky (which happens more often than people suspect), it is better to be extremely transparent about what the code is doing, rather than hiding it behind a function name.

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

I'll also push back against this line of thought. The sum total of possible interactions do not decrease when you move code out into a separate function. The same number of lines of code still get run, and each line carries the same potential to have a bug. In fact, in many cases, adding additional interfaces between components and generalizing them can increase the number of code paths and potential failure points.

If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

What I've come to understand is that complexity is relative. A solution that makes a codebase less complex for one person in an organization may make a codebase more complex for someone else in the organization who has different responsibilities over the codebase.

If you are building an application with a large team, and there are clear divisions of responsibilities, then functional boundaries are very helpful because they hide the messy details about how low-level parts of the code work.

However, if you are responsible for maintaining both the high-level and low-level parts of the same codebase, than separating that logic can sometimes make the program harder to manage, because you still have to understand how both parts of the codebase work, but now you also have understand how the interfaces and abstractions between them fit together and what their limitations are.

In single-person projects where I'm the only person touching the codebase I do still use abstractions, but I often opt to limit the number of abstractions, and I inline code more often than I would in a larger project. This is because if I'm the only person working on the code, I need to be able to hold almost the entire codebase in my head at the same time in order to make informed architecture decisions, and managing a large number of abstractions on top of their implementations makes the code harder to reason about and increases the number of things I need to remember. This was a hard-learned lesson for me, but has made (I think) an observable difference in the quality and stability of the code I write.

2 comments

danShumway

fouric 5 years ago

>> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

> This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.

Both of these things are not quite right. Yes, if you have to dig into the details of a function to understand what it does, it hasn't been explained well enough. No, the prototype cannot contain enough information to explain it. No, you shouldn't look at the implementation either - that leads to brittle code where you start to rely on the implementation behavior of a function that isn't part of the interface.

The interface and implementation of a function are separate. The former should be clearly-documented - a descriptive name is good, but you'll almost always also need docstrings/comments/other documentation - while you should rarely rely on details of the latter, because if you are, that usually means that the interface isn't defined clearly enough and/or the abstraction boundaries are in the wrong places (modulo things like looking under the hood to refactor, improve performance, etc - all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).

> If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.

This - this is what everyone who advocates for "small functions" doesn't understand.

danShumway 5 years ago

> all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).
I think this gets back to the old problem of "documentation is code that doesn't run." I'm not saying get rid of documentation -- I comment my code to an almost excessive degree, because I need to be able to remember in the future why I made certain decisions, I need to know what the list of tradeoffs were that went into a decision, I need to know if there are any potential bugs or edge-cases that I haven't tested for yet.
But what I am saying is that it is uncommon for a interface to be perfectly documented -- not just in code I write, but especially in 3rd-party libraries. It's not super-rare for me to need to dip into library source code to figure out behaviors that they haven't documented, or interfaces that changed between versions and aren't described anywhere. People struggle with good documentation.
Sometimes that's performance: if a 3rd-party library is slow, sometimes it's because of how it's implemented. I've run into that with d3 addons in the past, where changing how my data is formatted results in large performance gains, and only the implementation logic revealed that to me. Is that a leaky abstraction? Sure, I suppose, but it doesn't seem to be uncommon. Is it fragile? Sure, a bit, but I can't release charts that drop frames whenever they zoom just because I refuse to pay attention to the implementation code.
So I get what you're saying, but to me "abstractions shouldn't be leaking" is a bit like saying "code shouldn't have bugs", or "minor semvar increases should have no breaking changes." I completely agree, but... it does, and they do. Relying on undocumented behavior is a problem, but sometimes documented behavior diverges from implementation. Sometimes the abstractions are so leaky that you don't have a choice.
And that's not just a problem with 3rd-party code, because I'm also not a perfect programmer, and sometimes my own documentation on internal methods diverges from my implementation. I try very hard not to have that happen, but I also try hard to compensate for the fact that I'm a human being who makes mistakes. I try to build systems that are less work to maintain and less prone to having their documentation decay over time. I've found that in code that I'm personally writing, it can be useful to sidestep the entire problem and inline the entire abstraction. Then I don't have to worry about fragility at all.
If you're not introducing a 3rd-party library or a separate interface for every measly 50 lines of code, and instead you just embed your single-use chunk of logic into the original function you want to call it in, then you never have to worry about whether the abstraction is leaky. That can have a tangible effect on the maintainability of your program, because it reduces the number of opportunities you have to mess up an interface or its documentation.
For perfect abstractions, I agree with you. I'm not saying get rid of all abstractions. I just think that perfect abstractions are more difficult and rarer than people suppose, and sometimes for some kinds of logic, a perfect abstraction might not exist at all.