Comment by hackinthebochs

5 years ago

There's a lot of bad advice being tossed around in this thread. If you are worried about having to jump through multiple files to understand what some code is doing, you should consider that your naming conventions are the problem, not the fact that code is hidden behind functional boundaries.

Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Without these functional boundaries, you have to understand how every line of a function works, and then mentally model the entire graph of interactions at once, because of the potential for interactions between lines within a functional boundary. The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow. Keeping methods short and hiding behavior behind well named functional boundaries is how you manage complexity in code.

The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

> you have failed to sufficiently explain

This is the problem right here. I don't just read code I've written and I don't only read perfectly abstracted code. When I am stuck reading someone's code who loves the book and tries their best to follow those conventions I find it far more difficult - because I am usually reading their code to fully understand it myself (ie in a review) or to fix a bug I find it infuriating that I am jumping through dozens of files just so everything looks nice on a slide - names are great, I fully appreciate good naming but pretending that using a ton of extra files just to improve naming slightly isnt a hindrance is wild.

I will take the naming hit in return for locality. I'd like to be able to hold more than 5 lines of code in my head but leaping all over the filesystem just to see 3 line or 5 line classes that delegate to yet another class is too much.

  • Carmack once suggested that people in-line their functions more often, in part so they could “see clearly the full horror of what they have done” (paraphrased from memory) as code gets more complicated. Many helper functions can be replaced by comments and the code inlined. I tried this last year and it led to overall more readable code, imho.

  • The idea is that without proper boundaries, finding the line that needed to be changed may be a lot harder than clicking through files with an IDE. Smaller components also help with code reviews since it’s a lot easier to understand a line within the context of a component (or method name) without having to understand what the huge globs of code before it is doing. Also, like you said a lot of the times a developer has to read code they didn’t write so there are other factors to consider like how easy it is for someone from another team to make a change or whether a new employee could easily digest the code base.

    • The problem being solved here is just scope, not re-usability. Functions are a bad solution because they force non-locality. A better way to solve this would be local scope blocks, /that define their dependencies.

      E.g. something like:

          (reads: var_1, var_2; mutates: var_3) {
             var_3 = var_1 + var_2
          }
      

      You could also define which variables defined in the block get elevated, like return values:

          (reads: var_1, var_2; mutates: var_3) {
             var_3 = var_1 + var_2
             int result_value = var_1 * var_2
          } (exports: result_value)
      
          return result_value * 5
      

      This is also a more tailored solution to the problem than a function, it allows finer-grained control over scope restriction.

      It's frustrating that most existing languages don't have this kind of feature. Regular scope blocks suck because they don't allow you to define the specific ways in which they are permeable, so they only restrict scope in one direction (things inside the scope block are restricted) - but the outer scope is what you really want to restrict.

      You could also introduce this functionality to IDEs, without modifying existing languages. Highlight a few lines, and it could show you a pop-up explaining which variables that section reads, mutates and defines. I think that would make reading long pieces of code significantly easier.

      4 replies →

    • > Clicking through files with an IDE

      This is a big assumption. Many engineers prefer to grep through code without an IDE, the "clean code" style breaks grep/github code search and forces someone to install an IDE with go to declaration/find usages. On balance I prefer the clean code style and bought the jetbrains ultimate pack, however I do understand that some folks are working with grep/vim/code search and would rather not download a project to figure out how it works.

      6 replies →

>Coding at scale is about managing complexity.

I would extend this one level higher to say managing complexity is about managing risk. Risk is usually what we really care about.

From the article:

>any one person's opinions about another person's opinions about "clean code" are necessarily highly subjective.

At some point CS as a profession has to find the right balance of art and science. There's room for both. Codifying certain standards is the domain of professions (in the truest sense of the word) and not art.

Software often likens itself to traditional engineering disciplines. Those traditional engineering disciplines manage risk through codified standards built through industry consensus. Somebody may build a pressure system that doesn't conform to standards. They don't get to say "well your idea of 'good' is just an opinion so it's subjective". By "professional" standards they have built something outside the acceptable risk envelope and, if it's a regulated engineering domain, they can't use it.

This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.

  • A lot of "best practices" in engineering were established empirically, after root cause analysis of failures and successes. Software is more or less evolving along the same path (structured programming, OOP, higher-than-assembly languages, version control, documented ISAs).

    Go back to earlier machines and each version had it's own assembly language and instruction set. Nobody would ever go back to that era.

    OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer thanks to design patterns and abstractions dictated by a "Software Architect". We all know it to be false, and bordering on snake oil, but it still had some good ideas. Having a class encapsulate complexity and defining interfaces is neat. It forces to think in terms of abstractions and helps readability.

    > This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.

    As more and more years pass, I'm less and less against a regulatory body. Would help with getting rid of snake oil salesman in the industry and limit offshoring to barely qualified coders. And simplify hiring too by having a known certification that tells you someone at least meets a certain bar.

    • Software is to alchemy what software engineering is to chemistry. Software engineering hasn't been invented yet. You need a systematizing scientific revolution (Kuhn style) before you can or should create a regulatory body to enforce it. Otherwise you're just enforcing your particular brand of alchemy.

      1 reply →

    • > OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer.

      Not initially. Eventually, everything that reaches a certain minimal popularity in software development level gets pitched by snake-oil salesman to enterprise management as a solution to that problem, including things developed specifically to deal with the problem of othee solutions being cargo culted and repackaged that way, whether its a programming paradigm or a development methodology or metamethodology.

    • >having a known certification that tells you someone at least meets a certain bar.

      This was tried a few years back by creating a Professional Engineer licensure for software but it went away due to lack of demand. It could make sense to artificially create a demand by the government requiring it for, say, safety critical software but I have a feeling companies wouldn't want this out of their own accord because that license gives the employee a bit more bargaining power. It also creates a large risk to the SWEs due to the lack of codified standards and the inherent difficulty in software testing. It's not like a mechanical engineer who can confidently claim a system is safe because it was built to ASME standards.

      10 replies →

  • >the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.

    The problem I see with this is that programming could be described as a kind of general problem solving. Other engineering disciplines standardize methods that are far more specific, e.g. how to tighten screws.

    It's hard to come up with specific rules for general problems though. Algorithms are just solution descriptions in a language the computer and your colleagues can understand.

    When we look at specific domains, e.g. finance and accounting software, we see industry standards have already emerged, like dealing with fixed point numbers instead of floating point to make calculation errors predictable.

    If we now start codifying general software engineering, I'm worried we will just codify subjective opinions about general problem solving. And that will stop any kind of improvement.

    Instead we have to accept that our discipline is different from the others, and more of a design or craft discipline.

    • >kind of general problem solving

      Could you elaborate on this distinction? At the superficial level, "general problem solving" is exactly how I describe engineering in general. The example of tightening screws is just a specific example of a fastening problem. In that context, codified standards are an industry consensus on how to solve a specific problem. Most people wrenching on their cars are not following ASME torque guidelines but somebody building a spacecraft should be. It helps define the distinction of a professional build for a specific system. Fastening is the "general problem"; fastening certain materials for certain components in certain environments is the specific problem that the standards uniquely address.

      For software, there are quantifiable measures. As an example, there are some sorting algorithms that are objectively faster than others. For those systems that it matters in terms of risk, it probably shouldn't be left up to the subjective eye of an individual programmer, just like the spacecraft should rely on a technician's subjective opinion of that a bolt is "meh, tight enough."

      >I'm worried we will just codify subjective opinions about general problem solving.

      Ironically, this is the same attitude in many circles of traditional engineering. People who don't want adhere to industry standards have their own subjective ideas about should solve the problem. Standards aren't always right, but it creates a starting point to 1) identify a risk and 2) find an acceptable way to mitigate it.

      >Instead we have to accept that our discipline is different from the others

      I strongly disagree with this and I've seen this sentiment used (along with "it's just software") to justify all kinds of bad design choices.

      2 replies →

  • > At some point CS as a profession has to find the right balance of art and science.

    That seems like such a hard problem. Why not tackle a simpler one?

    • I didn’t downvote but I’ll weigh in on why I disagree.

      The glib answer is “because it’s worth it.” As software interfaces with more and more of our lives, managing the risks becomes increasingly important.

      Imagine if I transported you back 150 years to when the industrial revolution and steam power were just starting to take hold. At that time there were no consensus standards about what makes a mechanical system “good”; it was much more art than science. The numbers of mishaps and the reliability reflected this. However, as our knowledge grew we not only learned about what latent risks were posed by, say, a boiler in your home but we also began to define what is an acceptable design risk. There’s still art involved, but the science we learned (and continue to learn) provides the guardrails. The Wild West of design practice is no longer acceptable due to the risk it incurs.

    • I imagine that's part of why different programming languages exist -- IE you have slightly less footguns with Java than with C++.

      The problem is, the nature of writing software intrinsically requires a balance of art and science no matter what language it is. That is because solving business problems is a blend of art and science.

      It's a noble aim to try and avoid solving unnecessarily hard problems, but when it comes to the customer, a certain amount of it gets incompressible. So you can't avoid it.

Yes, coding at scale is about managing complexity. No, "Keeping methods short" is not a good way to manage complexity, because...

> then mentally model the entire graph of interactions at once

...partially applies even if you have well-named functional boundaries. You said it yourself:

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow.

Programs have a certain essential complexity. Making a function "simpler" means making it less complex, which means that that complexity has to go somewhere else. If you make all of your functions simple, then you simply need more functions to represent the same program, which increases the total number of possible interactions between nodes and therefore the cognitive load of understanding the whole graph/program.

Allowing more complexity in your functions makes them individually harder to understand, but reduces the total number of functions needed and therefore makes the entire program more comprehensible.

Also note that just because a function's implementation is complex doesn't mean that its interface also has to be complex.

And, functions with complex implementations are only themselves difficult to understand - functions with complex interfaces make the whole system more difficult to understand.

  • This is where Occam's Razor applies - do not multiply entities unnecessarily.

    Having hundreds or thousands of simple functions is the opposite of this advice.

    You can also consider this in more scientific terms.

    Code is a mental model of a set of operations. The best possible model has as few moving parts as possible, there are as few connections between the parts as possible, each part is as simple as possible, and both the parts and the connections between them are as intuitively obvious as possible.

    Making parts as simple as possible is just one design goal, and not a very satisfactory or useful one in its own terms.

    All of this turns out to be incredibly hard, and is a literal IQ test. Mediocre developers will always, always create overcomplicated solutions. Top developers have a magical ability to combine a 10,000 foot overview with ground level detail, and will tear through complex problems and reduce them to elegant simplicity.

    IMO we should spend less time teaching algorithms and testing algorithmic specifics, and more on analysing complex systems and implementing them with minimal, elegant, intuitive models.

    • Lately I’ve found decoupling to be helpful in this regard.

      This is an auth layer, it’s primary charge is ensure those receiving and modifying resources have the permissions to do so.

      This is the data storage layer. It’s focused on clean, relatively generic data storage abstractions and models that are relatively unopinionated, and flexible.

      This is the contract layer. It’s more concerned with combining the apis of the data and auth than it is with data transformation or business logic.

      This is the business logic layer. It takes relatively abstract data from our API and performs transformations to massage it into shapes that fit the needs of our customers and the mental models we’ve created around those requirements.

      Etc. Etc.

      Of course this pragmatic decoupling is easier said than done, but the logical grouping of like concerns allows for discoverability, flexibility, and a generally clear demarcation of concerns.

      2 replies →

  • >If you make all of your functions simple, then you simply need more functions to represent the same program

    The semantics of the language and the structure of the code help hide irrelevant functional units from the global namespace. Methods attached to an object only need to be considered when operating on some object, for example. Private methods do not pollute the global namespace nor do they need to be present in any mental model of the application unless it is relevant to the context.

    While I do think you can go too far with adding functions for its own sake, I don't see that they add to the cognitive load in the same way that possible interactions within a functional unit does. If you're just polluting a global namespace with functions and tiny objects, then that does similarly increase cognitive load and should be avoided.

  • > No, "Keeping methods short" is not a good way to manage complexity

    Agreed

    > Allowing more complexity in your functions makes them individually harder to understand

    I think that that can mostly be avoided, by sometime creating local scopes {..} to avoid too much state inside a function, combined with whitespace and some section "header" comments (instead of what would have been sub function names).

    Can be quite readable I think. And nice to not have to jump back and forth between myriads of files and functions

I have found this to be one of those A or B developer personas that are hard for someone to change, and causes much disagreement. I personally agree 100%, but have known other people who couldn't disagree more, it is what it is.

I've always felt it had a strong correlation to top-down vs bottom-up thinkers in terms of software design. The top-down folks tend to agree with your stance and the bottom-up group do not. If you're naturally going to want to understand all of the nitty gritty details you want to be able to wrap your head around those as quickly as possible. If you're willing to think in terms of the abstractions you want to remove as many of those details from sight as possible to reduce visual noise.

  • I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.

    Have you ever seen a codebase with infrastructure and piping taking about 70% of the code, with tiny pieces of business logic thrown here and there? It's impossible to figure out where the actual job is being done (and what it actually is): all you can see is just an endless chain of methods that mostly just delegate the responsibility further and further. What could've been a 100-line loop of "foreach item in worklist, do A, B, C" kind is instead split over seven tightly cooperating classes that devote 45% of their code to multiplexing/load-balancing/messaging/job-spooling/etc, another 45% to building trivial auxiliary structure and instantiating each other, and only 10% actually devoted to the actual data processing, but good luck finding those 10%, because there is a never-ending chain of calling each other: A.do_work() calls B.process_item() which calls A.on_item_processing() which calls B.on_processed()... wait, shouldn't there been some work done between "on_item_processing" and "on_processed"? Yes, it was done by an inconspicuously named "prepare_next_worklist_item" function.

    Ah, and the icing on the cake: looping is actually done from the very bottom of this call chain by doing a recursive call to the top-most method which at this point is about 20 layers above the current stack frame. Just so you can walk down this path again, now with the feeling.

    • Your comment gives me emotional flashbacks. Years ago I took Java off my resume, because I don’t want to ever interact with this sort of thing again. (I’m sure it exists in other languages, but I’ve never seen it quite as bad as in Java.)

      I think the best “clean code” programming advice is the advice writers have been saying for centuries. Find your voice. Be direct and be brief. But not too brief. Programming is a form of expression. Step 1 is to figure out what you’re trying to say (eg the business logic). Then say it in its most natural form (switch statements? If-else chain? Whatever). Then write the simplest scaffold around it you can so it gets called with the data it needs.

      The 0th step is stepping away from your computer and naming what you want your program to express in the first place. I like to go for walks. Clear code is an expression of clear thoughts. You’ll usually know when you’ve found it because it will seem obvious. “Oh yeah, this code is just X. Now I just have to type it up.”

    • >I wish there was an "auto-flattener"/"auto-inliner" tool

      I'm as big an advocate of "top-down" design as anyone, and I have also wished for such a tool. When you just want to know "what behavior comes next", all the abstractions do get in the way. The IDE should be able to "flatten" the execution path from current context and give you a linear view of the code. Sort of like a trace of a debug session, but generated on-the-fly. But still, I don't think this is the best way to write code.

    • Most editors have code folding. I've noticed this helps when there are comments or it's easy to figure out the branching or what not.

      However, what you're asking for is a design style that's hard to implement I think without language tooling (for example identifying effectful methods).

      11 replies →

    • > I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.

      Learn to read assembly and knock yourself out.

      4 replies →

  • While I think you are onto something about top-down vs. bottom-up thinkers, one of the issues with a large codebase is literally nobody can do the whole thing bottom-up. So you need some reasonable conventions and abstraction, or the whole thing falls apart under it's own weight.

    • Yep, absolutely.

      That's another aspect of my grand unifying theory of developers. Those same personas seem to have correlations in other ways: dynamic vs static typing, languages, monolith vs micro service. How one perceives complexity, what causes one to complain about complexity, etc all vary based on these things. It's easy to arrive in circumstances where people are arguing past each other.

      If you need to be able to keep all the details in your head you're going to need smaller codebases. Similar, if you're already keeping track of everything, things like static typing become less important to you. And the opposite is true.

      13 replies →

  • I’m reminded of an earlier HN discussion about an article called The Wrong Abstraction, where I argued¹ that abstractions have both a benefit and a cost and that their ratio may change as a program evolves and which of those “nitty gritty details” are immediately relevant and which can helpfully be hidden behind abstractions changes.

    ¹ https://news.ycombinator.com/item?id=23742118

  • The point is that bottom-up code is a siren song. It never scales. It makes it a lot easier to get started, but given enough complexity it inevitably breaks down.

    Once your codebase gets to somewhere around the 10,000 line mark, it becomes impossible for a single mind to hold the entire program in their head at a single time. The only way to survive past that point is with carefully thought out, water tight layers of abstractions. That almost never happens with bottom-up. Bottom-up is a lot like natural selection. You get a lot of kludges that work great to solve their immediate problem, but behave in undefined and unpredictable ways when you extend them outside their original environment.

    Bottom-up can work when you're inside well-encapsulated modular components with bounded scope and size. But there's no way to keep those modules loosely coupled unless you have a elegant top-down architecture imposing order at the large-scale structure.

    • But the reverse is also true. Top-down programming doesn't really work well for smaller programs, it definitely doesn't work well when you're dealing with small, highly performance-critical or complex tasks.

      So sure, I'll grant that when your program reaches the 10,000 line mark, you need to have some serious abstractions. I'll even give you that you might need to start abstracting things when a file reaches 1,000 lines.

      But when we start talking about the rule of 30 -- that's not managing complexity, that's alphabetizing a sock drawer and sewing little permanent labels on each sock. That approach also doesn't scale to large programs because it makes rewrites and refactors into hell, and it makes new features extremely cumbersome to quickly iterate on. Your 10,000 line program becomes 20,000 lines because you're throwing interfaces and boilerplate all over the place.

      Note that this isn't theoretical, I have worked in programs that did everything from building an abstraction layer over the database in case we wanted to use Mongo and SQL at the same time (we didn't), to having a dependency management system in place that meant we had to edit 5 files every time we wanted to add a new class, to having a page lifecycle framework that was so complicated that half of our internal support requests were trying to figure out when it was safe to start adding customer data to the page.

      The benefit of a good, long, single-purpose function that contains all of its logic in one place is that you know exactly what the dependendencies are, you know exactly what the function is doing, you know that no one else is calling into the inlined logic that you're editing, and you can easily move that code around and change it without worrying about updating names or changing interfaces.

      Abstract your code, but abstract your code when or shortly before you hit complexity barriers and after you have enough knowledge to make informed decisions about which abstractions will be helpful -- don't create a brand new interface every time you write a single function. It's fine to have a function that's longer than a couple hundred lines. If you're building something like a rendering or update loop, in many cases I would say it's preferable.

      9 replies →

    • As mainly a bottom-up person, I completely agree with your analysis but I wonder if you might be using "top-down architecture" here in an overloaded way?

      My personal style is bottom up, maximally direct code, aiming for monolithic modules under 10kloc, combined with module coupling over very narrow interfaces. Generally the narrow interfaces emerge from finding the "natural grain" of the module after writing it, not from some a priori top-down idea of how the communication pathways should be shaped.

      Edit: an example of a narrow interface might be having a 10kloc quantitative trading strategy module that communicates with some larger system only by reading off a queue of things that might need to be traded, and writing to a queue of desired actions.

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

That's only 1 part of the complexity equation.

When you have 100 lines in 1 function you know exactly the order in which each line will happen and under which conditions by just looking at it.

If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.

Short functions are good when they fit the problem. Often they don't.

  • I feel like this is only a problem if the small functions share a lot of global state. If each one acts upon its arguments and returns values without side effects, ordering is much less of an issue IMO.

    • Well, if they were one function before they probably share some state.

      Clean code recommends turning that function into a class and promoting the shared state from local variables into fields. After such a "refactoring" you get a nice puzzle trying to understand what exactly happens.

      11 replies →

    • Yes and no.

      What I find is that function boundaries have a bunch of hidden assumptions we don't think about.

      Especially things like exceptions.

      For all these utility functions are you going to check input variables, which means doing it over, over and over again. Catching exceptions everywhere etc?

      A function can be used for a 'narrow use case' - but - when it's actually made available to other parts of the system, it needs to be kind of more generalized.

      This is the problem.

      Is it possible that 'nested functions' could provide a solution? As in, you only call the function once, in the context of some other function, so why not physically put it there?

      I can have it's own stack, be tested separately if needed, but it remains exclusive to the context that it is in from a readability perspective - and you don't risk having it used for 'other things'.

      You could even have an editor 'collapse' the function into a single line of code, to make the longer algorithm more readable.

      6 replies →

  • >If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.

    It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example. Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?

    You're missing likely missing one or more techniques that make this work well:

    1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.

    2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.

    3. Similar levels of abstraction in each function (e.g. not having both a for loop, several if statements based on variables defined in the funtion, and 3 method calls, instead having 4-5 method calls doing the same thing).

    4. Explicit pre/post conditions in each method are called out due to the passing in of parameters and the return values. This more effectively helps a reader understand the lifecycle of relevant variables etc.

    In your example of 100 lines, the counterpoint is that now I have a method that has at least 100 ways it could work / fail. By breaking that up, I have the ability to reason about each use case / failure mode.

    • > It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example.

      One of the codebases I'm currently working is a big example of that. I obviously can't share parts of it, but I'll say that I agree with GP. Lots of tiny functions kills readability.

      > 1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.

      Assuming your language supports this. C++ notably doesn't, especially in the cases where you'd produce such small functions - inside a single translation unit, in an anonymous namespace, where enforcing "caller before callee" order would require you to forward-declare everything up front. Which is work, and more lines of code.

      > 2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.

      That's table stakes. Unfortunately, quite often a properly descriptive name would be 100+ characters long, which obviously nobody does.

      > 3. Similar levels of abstraction in each function

      That's a given, but in a way, each "layer" of such functions introduces its own sublevel of abstraction, so this leads to abstraction proliferation. Sometimes those abstractions are necessary, but I found it easier when I can handle them through few "deep" (as Ousterhout calls it) functions than a lot of "shallow" ones.

      > 4. Explicit pre/post conditions in each method

      These introduce a lot of redundant code, just so that the function can ensure a consistent state for itself. It's such a big overhead that, in practice, people skip those checks, and rely on everyone remembering that these functions are "internal" and had their preconditions already checked. Meanwhile, a bigger, multi-step function can check those preconditions once.

      1 reply →

    • > You're missing likely missing one or more techniques that make this work well:

      I know how to do it, I just don't always think it's worth it.

      > Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?

      Not 100 lines, just 34, but it's a good example of a function I wouldn't split even if it get to 300 lines.

          function getFullParameters() {
              const result = {
                  "gridType": { defaultValue: 1, randomFn: null, redraw: onlyOneRedraw("grid"), },
                  "gridSize": { defaultValue: 32, randomFn: null, redraw: onlyOneRedraw("grid"), },
                  "gridOpacity": { defaultValue: 40, randomFn: null, redraw: onlyOneRedraw("grid"), },
                  "width": { defaultValue: 1024, randomFn: null, redraw: allRedraws(), },
                  "height": { defaultValue: 1024, randomFn: null, redraw: allRedraws(), },
                  "seed": { defaultValue: 1, randomFn: () => Math.round(Math.random() * 65536), redraw: allRedraws(), },
                  "treeDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 100), redraw: onlyOneRedraw("trees"), },
                  "stoneDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 20 * Math.random() * 5), redraw: onlyOneRedraw("stones"), },
                  "twigsDensity": { defaultValue: 40, randomFn: () => Math.round(Math.random() * 20 * Math.random() * 5), redraw: onlyOneRedraw("twigs"), },
                  "riverSize": { defaultValue: 3, randomFn: () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, redraw: onlyRedrawsAfter("river"), },
                  "roadSize": { defaultValue: 0, randomFn: () => Math.random() > 0.5 ? Math.round(Math.random() * 10) : 0, redraw: onlyRedrawsAfter("river"), },
                  "centerRandomness": { defaultValue: 20, randomFn: () => Math.round(30), redraw: onlyOneRedraw("trees"), },
                  "leavedTreeProportion": { defaultValue: 95, randomFn: () => Math.round(Math.random() * 100), redraw: onlyOneRedraw("trees"), },
                  "treeSize": { defaultValue: 50, randomFn: () => Math.round(30) + Math.round(Math.random() * 40), redraw: onlyOneRedraw("trees"), },
                  "treeColor": { defaultValue: 120, randomFn: () => Math.round(Math.random() * 65536), redraw: onlyOneRedraw("trees"), },
                  "treeSeparation": { defaultValue: 40, randomFn: () => Math.round(80 + Math.random() * 20), redraw: onlyOneRedraw("trees"), },
                  "serrationAmplitude": { defaultValue: 130, randomFn: () => Math.round(80 + Math.random() * 40), redraw: onlyOneRedraw("trees"), },
                  "serrationFrequency": { defaultValue: 30, randomFn: () => Math.round(80 + Math.random() * 40), redraw: onlyOneRedraw("trees"), },
                  "serrationRandomness": { defaultValue: 250, randomFn: () => Math.round(100), redraw: onlyOneRedraw("trees"), },
                  "colorRandomness": { defaultValue: 30, randomFn: () => Math.round(20), redraw: onlyOneRedraw("trees"), },
                  "clearings": { defaultValue: 9, randomFn: () => Math.round(3 + Math.random() * 10), redraw: onlyRedrawsAfter("clearings"), },
                  "clearingSize": { defaultValue: 30, randomFn: () => Math.round(30 + Math.random() * 20), redraw: onlyRedrawsAfter("clearings"), },
                  "treeSteps": { defaultValue: 2, randomFn: () => Math.round(3 + Math.random() * 2), redraw: onlyOneRedraw("trees"), },
                  "backgroundNo": { defaultValue: 1, randomFn: null, redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
                  "showColliders": { defaultValue: 0, randomFn: null, redraw: onlyOneRedraw("colliders"), },
                  "grassLength": { defaultValue: 85, randomFn: () => Math.round(25 + Math.random() * 50), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
                  "grassDensity": { defaultValue: 120, randomFn: () => Math.round(25 + Math.random() * 50), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
                  "grassSpread": { defaultValue: 45, randomFn: () => Math.round(5 + Math.random() * 25), redraw: onlyTheseRedraws(["background", "backgroundCover"]), },
                  "autoredraw": { defaultValue: true, randomFn: null, redraw: noneRedraws(), },
              };
              return result;
          }
      

      There's a lot of value in having all of this in one place. Ordering isn't a problem here, just no need to refactor.

      3 replies →

  • I am surprised that this is the top answer (Edit: at the moment, was)

    How does splitting code into multiple functions suddenly change the order of the code?

    I would expect that these functions would be still called in a very specific order.

    And sometimes it does not even make sense to keep this order.

    But here is a little example (in a made up pseudo code):

      function positiveInt calcMeaningOfLife(positiveInt[] values)
        positiveInt total = 0
        positiveInt max = 0
        for (positiveInti=0; i < values.length; i++) 
          total = total + values[i]
          max = values[i] > max ? values[i] : max
        return total - max
    

    ===>

      function positiveInt max(positiveInt[] values)
        positiveInt max = 0
        for (positiveInt i=0; i < values.length; i++) 
          max = values[i] > max ? values[i] : max
        return max
    
      function positiveInt total(positiveInt[] values)
        positiveInt total = 0
        for (positiveInt i=0; i < values.length; i++) 
          total = total + values[i]
        return total
    
      function positiveInt calcMeaningOfLife(positiveInt[] values)
        return total(values)-max(values)
    

    Better? No?

    • > How does splitting code into multiple functions suddenly change the order of the code?

      Regardless of how smart your compiler is and all the tricks it pulls to execute the codein much the same order, the order in which humans read the pseudo code is changed

        01. function positiveInt max(positiveInt[] values)
        02.   positiveInt max = 0
        03.   for (positiveInt i=0; i < values.length; i++) 
        04.     max = values[i] > max ? values[i] : max
        05.   return max
      
        07. function positiveInt total(positiveInt[] values)
        08.   positiveInt total = 0
        09.   for (positiveInt i=0; i < values.length; i++) 
        10.     total = total + values[i]
        11.   return total
      
        12. function positiveInt calcMeaningOfLife(positiveInt[] values)
        13.   return total(values) - max(values)
      
      

      Your modern compiler will take care of order in which the code is executed, but as humans need to trace the code line-by-line as [13, 12, 01, 02, 03, 04, 05, 07, 08, 09, 10, 11]. By comparison, the inline case can be understood sequentially by reading lines 01 to 07 in order.

        01. function positiveInt calcMeaningOfLife(positiveInt[] values)
        02.   positiveInt total = 0
        03.   positiveInt max = 0
        04.   for (positiveInt i=0; i < values.length; i++) 
        05.     total = total + values[i]
        06.     max = values[i] > max ? values[i] : max
        07.   return total - max
      

      > Better? No?

      In most cases, yeah probably your better off with the two helper functions. max() and total() are common enough operations, and they are named well enough that we can easily guess their intent without having to read the function body.

      However, depending on the size of the codebase, the complexity of the surrounding functions and the location of the two helper functions it's easy to see that this might not always be the case.

      If you want to try and understand the code for the first time, or if you are trying to trace down some complex bug there's a chance having all the code inline would help you.

      Further, splitting up a large inline function is more trivial than reassembling many small functions (hope you got your unit tests!).

      > And sometimes it does not even make sense to keep this order.

      Agreed. But naming and abstractions are not trival problems. Often times it's the larger/more complex codebases, where you see these practices get applied more dogmatically

      6 replies →

There's certainly some difference in priorities between massive 1000-programmer projects where complexity must be aggressively managed and, say, a 3-person team making a simple web app. Different projects will have a different sweet spot in terms of structural complexity versus function complexity. I've seen code that, IMO, misses the sweet spot in either direction.

Sometimes there is too much code in mega-functions, poor separation of concerns and so on. These are easy mistakes to make, especially for beginners, so there are a lot of warnings against them.

Other times you have too many abstractions and too much indirection to serve any useful purpose. The ratio of named things, functional boundaries, and interface definitions to actual instructions can easily get out of hand when people dogmatically apply complexity-managing patterns to things that aren't very complex. Such over-abstraction can fall under YAGNI and waste time/$ as the code becomes slower to navigate, slower to understand in depth, and possibly slower to modify.

I think in software engineering we suffer more from the former problem than the latter problem, but the latter problem is often more frustrating because it's easier to argue for applying nifty patterns and levels of indirection than omitting them.

Just for a tangible example: If I have to iterate over a 3D data structure with an X Y and Z dimension, and use 3 nested loops to do so, is that too complex a function? I'd say no. It's at least as clear without introducing more functional boundaries, which is effort with no benefit.

Well named functions are only half (or maybe a quarter) of the battle. Function documentation is paramount in complex codebases, since documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.

Yeah, it's a lot of work, but working on recent projects have really taught me the value of good documentation. Naming a function send_records_to_database is fine, but it can't tell you how it determines which database to send the records to, or how it deals with failed records (if at all), or various alternative use cases for the function. All of that must come from documentation (or reading the source of that function).

Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design. When you have to say, "this function reads <some value> name from <environmental variable>" then you have to spend some time considering if future users will find that to be a sound decision.

  • > documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.

    I'd argue that writing that much documentation about a single function suggests that the function is a problem and the "send_records_to_database" example is a bad name. It's almost inevitable that the function doing so much and having so much behavior that needs documentation will, at some point, be changed and make the documentation subtly wrong, or at least incomplete.

    • What's the alternative? Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.

      You can argue that there should be separate functions for `send_to_database` and `lock_database` and `format_data_for_database` and `handle_db_error`. But you're still going to have to document the same stuff. You're still going to have to remind people to lock the database in some situations. You're still going to have to worry about people forgetting to call one of those functions.

      And eventually you're going to expose a single endpoint/interface that handles an entire database transaction including stuff like data sanitation and error handling, and then you're going to need to document that endpoint/interface in the same way that you would have needed to document the original function.

      5 replies →

  • Yikes, I hope I don't have to read documentation to understand how the code deals with failed records or other use cases. Good code would have the use cases separated from the send_records_to_database so it would be obvious what the records were and how failure conditions are handled.

    • How else are you going to understand how a library works besides RTFM or RTFC? I guess the third option is copy pasta from stack overflow and hope your use case doesn't require any significant deviation?

      You seriously never have to read documentation?

      Must be nice, I've been balls-deep in GCP libraries and even simple things like pulling from a PubSub topic have footguns and undocumented features in certain library calls. Like subscriber.subscribe returns a future that triggers a callback function for each polled message, while subscriber.pull returns an array of messages.

      That's a pretty damn obvious case where functions should have been named "obviously" (pull_async, pull_sync), yet they weren't. And that's from a very widely used service from one of the biggest tech companies out there, written by a person that presumably passed one of the hardest interviews in the industry and gets paid in the top like 1% of developer.

      Without documentation, I would have never figured those out.

  • "Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design."

    This, this, and... this.

    Sometimes, I step back after writing documentation and realise, this is a bunch of baloney. It could be much simpler, or this is a terrible decision! My point: Writing documentation is about expressing the function a second time -- the first time was code, the second time was natural language. Yeah, it's not a perfect 1:1 (see: the law in any developed country!), but it is a good heuristic.

  • Documentation is only useful it is up to date and correct. I ignore documentation because I've never found the above are true.

    There are contract/proof systems that seem like they might work help. At least the tool ensures it is correct. However I'm not sure if such systems are readable. (I've never used one in the real world)

    • Oh I agree, but a person who won't take the time to update documentation after a significant change, certainly isn't going to refactor the code such that the method name matches the updated functionality. Assuming they can even update the name if they wanted to.

      After all, documentation is cheap. If you're going to write a commit message, why not also update the function docs with pretty much the same thing? "Filename parameter will now use S3 if an appropriate URI is passed (i., filename='s3://bucket/object/path.txt'). Note: doesn't work with path-style URLs."

> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects.

Code telling a story is a fallacy that programmers keep telling themselves and which fails to die. Code doesn't tell stories, programmers do. Code can't explain why it exists; it can't tell you about the buggy API it relies on and which makes its implementation weird and not straight-forward; it can't say when it's no longer needed.

Good names are important, but it's false that having well-chosen function and arguments names will tell a programmer everything they need to know.

  • >Code doesn't tell stories, programmers do. Code can't explain why it exists;

    Code can't tell every relevant story, but it can tell a story about how it does what it does. Code is primarily written for other programmers. Writing code in such a way that other people with some familiarity with the problem space can understand easily should be the goal. But this means telling a story to the next reader, the story of how the inputs to some functional unit are translated into its outputs or changes in state. The best way to explain this to another human is almost never the best way to explain it to a computer. But since we have to communicate with other humans and to the computer from the same code, it takes some effort to bridge the two paradigms. Having the code tell a story at the high level by way of the modules, objects and methods being called is how we bridge this gap. But there are better and worse ways to do this.

    Software development is a process of translating the natural language-spec of the system into a code-spec. But you can have the natural language-spec embedded in the structure of the code to a large degree. The more, the better.

    • Code is not primarily written for other programmers. It's written for the computer, the primary purpose is to tell the computer what to do. Readability is desirable, but inherently secondary to that concern, and abstraction often interferes with your ability to understand and express what is actually happening on the silicon - even if it improves your ability to communicate the abstract problem. Is that worth it? It's not straightforward.

      An overemphasis on readability is how you get problems like "Twitter crashing not just the tab but people's entire browser for multiple years". Silicon is hard to understand, but hiding it behind abstractions also hides the fundamental territory you're operating in. By introducing abstractions, you may make high-level problems easier to tackle, but you make it much harder to tackle low-level problems that inevitably bubble up.

      A good symptom of this is that the vast majority of JS developers don't even know what a cache miss is, or how expensive it is. They don't know that linearly traversing an array is thousands of times faster than linearly traversing a (fragmented) linked list. They operate in such an abstract land that they've never had to grapple with the actual nature of the hardware they're operating on. Performance issues that arise as a result of that are a great example of readability obscuring the fundamental problem.

      12 replies →

  • > Code doesn't tell stories, programmers do

    It is like saying the books do not tell stories, writers do.

    • It is, but GP's point is pretty clear. Perhaps a better way to express it would be: unlike natural languages, programming languages are insufficiently expressive for the code to tell the full story. That's why books tell stories, and code is - at best - Cliff's Notes.

  • Is code just a byproduct of specs then? Any thoughts on literate programming?

    • Literate programming is for programs that is static and don't ever change much. Works great for those cases though.

      No, what works is the same that worked 20 years ago. Nothing have truly changed. You still have layers upon layers, that sometimes pass something, othertimes not, and you sometimes wished it passed something, othertimes not.

Your argument falls apart once you need to actually debug one of these monstrosities, as often the bug itself also gets spread out over half a dozen classes and functions, and it's not obvious where to fix it.

More code, more bugs. More hidden code, more hidden bugs. There's a reason those who have worked in software development longer tend to prefer less abstraction: most of them are those who have learned from their experiences, and those who aren't are "architects" optimising for job security.

  • If a function is only called once it should just be inline, the IDE can collapse. A descriptive comment can replace the function name. It can be a lambda with immediate call and explicit captures if you need to prevent the issue of not knowing which local variables it interacts with as the function grows significantly, or if the concern is others using leftover variables its own can go into a plain scop e. Making you have to jump to a different area of code to read just breaks up linear flow for no gain, especially when you often have to read it anyway to make sure it doesn't have global side effects, might as well read it in the single place it is used.

    If it is going to be used more than once and is, then make a function (unless it is so trivial the explicit inline is more readable). If you are designing a public API where it may need to be overridden count it as more than once.

    Some of the above is language dependent.

    • I don't get this. This is literally what the 'one level of abstraction' rule is for.

      If you can find a good name for a piece of code I don't need to read in detail, why do you want to make me skip from line 1458 to line 2345 to skip over the details of how you do that thing? And why would you add a comment on it instead of making it a function that is appropriately named and I don't have to break my reading flow to skip over a horrendously huge piece of code?

      3 replies →

> The best code is code you don't have to read because of well named functional boundaries.

I don't know which is harder. Explaining this about code, or about tests.

The people with no sense of DevX see nothing wrong with writing tests that fail as:

    Expected undefined to be "foo"

If you make me read the tests to modify your code, I'm probably going to modify the tests. Once I modify the tests, you have no idea if the new tests still cover all of the same concerns (especially if you wrote tests like the above).

Make the test red before you make it green, so you know what the errors look like.

  • Oh god. Or just the tests that are walls of text, mixes of mocks and initializers and constructors and method calls.

    Like good god, extract that boiler plate into a function. Use comments and white space to break it up and explain the workflow.

    • I have a couple people who use a wall of boiler plate to do something 3 lines of mocks could handle, and not couple the tests to each other in the process.

      Every time I have to add a feature I end up rewriting the tests. But you know, code coverage, so yay.

    • I see this with basically any Javascript test. Yes, mocking any random import is really cool and powerful, but for fucks sake, can we just use a DI container so that the tests don’t look like satans’ invocation.

  • > Make the test red before you make it green, so you know what the errors look like.

    Oh! I like this. I never considered this particular reason why making tests fail first might be a good idea.

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” ― C. A. R. Hoare

this quote scales

  • This quote does not scale. Software contains essential complexity because it was built to fulfill a need. You can make all of the beautiful, feature-impoverished designs you want - they won't make it to production, and I won't use them, because they don't do the thing.

    If your software does not do the thing, then it's not useful, it's a piece of art - not an artifact of software engineering that is meant to fulfill a purpose.

But not everybody codes “at scale”. If you have a small, stable team, there is a lot less to worry about.

Secondly it is often better to start with less abstractions and boundaries, and add them when the need becomes apparent, rather than trying to remove ill conceived boundaries and abstractions that were added at earlier times.

  • Coding at scale is not dependent on the number of people, but on the essential complexity of the problem. One can fail at a one-man project due to lack of proper abstraction with a sufficiently complex problem. Like, try to write a compiler.

> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

That's fine in theory and I still sort-of believe that, but in practice, I came to believe most programming languages are insufficiently expressive for this vision to be true.

Take, as a random example, this bit of C++:

  //...
  const auto foo = Frobnicate(bar, Quuxify);

Ok, I know what Frobnification is. I know what Quuxify does, it's defined a few lines above. From that single line, I can guess it Frobs every member of bar via Quuxify. But is bar modified? Gotta check the signature of Frobnicate! That means either getting an IDE help popup, or finding the declaration.

  template<typename Stuffs, typename Fn>
  auto Frobnicate(const std::vector<Stuffs>&, Fn)
    -> std::vector<Stuffs>;

From the signature, I can see that bar full of Bars isn't going to be modified. But then I think, is foo.size() going to be equal to bar.size()? What if bar is empty? Can Frobnicate throw an exception? Are there any special constraints on the function Fn passed to it? Does Fn have to be a funcallable thing? Can't tell that until I pop into definition of Frobnicate.

I'll omit the definition here. But now that I see it, I realize that Fn has to be a function of a very particular signature, that Fn is applied to every other element of the input vector (and not all of them, as I assumed), that the code has a bug and will crash if the input vector has less than 2 elements, and it calls three other functions that may or may not have their own restrictions on arguments, and may or may not throw an exception.

If I don't have a fully-configured IDE, I'll likely just ignore it and bear the risk. If I have, I'll routinely jump-to-definition into all these functions, quickly eye them for any potential issues... and, if I have the time, I'll put a comment on top of Frobnicate declaration, documenting everything I just learned - because holy hell, I don't want to waste my time doing the same thing next week. I would rename the function itself to include extra details, but then the name would be 100+ characters long...

Some languages are better at this than others, but my point is, until we have programming languages that can (and force you to) express the entire function contract in its signature and enforce this at compile-time, it's unsafe to assume a given function does what you think it does. Comments would be a decent workaround, if most programmers could be arsed to write them. As it is, you have to dig into the implementation of your dependencies, at least one level deep, if you want to avoid subtle bugs creeping in.

  • This is a good point and I agree. In fact, I think this really touches on why I always had a hard time understanding C++ code. I first learned to program with C/C++ so I have no problem writing C++, but understanding other people's code has always been much more difficult than other languages. Its facilities for abstraction were (historically) subpar, and even things like aliased variables where you have to jump to the function definition just to see if the parameter will be modified really get in the way of easy comprehension. And then the nested template definitions. You're right that how well relying on well named functional boundaries works depends on the language, and languages aren't at the point where it can be completely relied on.

  • This is true but having good function names will at least help you avoid going two levels deep. Or N levels. Having a vague understanding of a function call’s purpose from its name helps because you have to trim the search tree somewhere.

    Though, if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?

    • > having good function names will at least help you avoid going two levels deep. Or N levels.

      I agree. You have to trim your search space, or you'll never be able to do anything. What I was trying to say is, I don't know of the language that would allow you to only ever rely on function names/signatures. None that I worked could do that in practice.

      > if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?

      That's the reason I hate the "Clean Code"-ish pattern of lots of very tiny functions. I worked in a codebase written in this style, and doing anything with it felt like it was 90% jumping around function definitions, desperately trying to keep them all in my working memory.

      4 replies →

Function names are comments, and have similar failure modes.

  • Comments that are limited to only a 2 or 3 dozen characters at most, so worse than comments ime.

    • You can put your prose at the top of the function if you really need to explain it more. :)

  • But it's easier to notice they're outdated, because you don't see them only when looking at the implementation.

> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

Which is often unavoidable, many functions are insufficiently explained by those alone unless you want four-word camelcase monstrosities for names. The code of the function should be right-sized. Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger. I work on compilers, query processors and compute engines, cognitive load from the subject domains are bad enough without making the code arbitrarily shaped.

[edit] oh yes, what jzoch says below. Locality helps with taming the network of complexity between functions and data.

[edit] oh no, here come the downvotes!

  • > ...many functions are insufficiently explained by [naming and set of arguments] alone unless you want four-word camelcase monstrosities for names.

    Come now, is four words really all that "monstrously" much?

    > The code of the function should be right-sized.

    Feels like that should go for its name too.

    > Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger.

    The longer the code, the longer the name?

    • Quite a bit of sentiment around against long names, I personally am fine with them up to about 30-35 chars or so, then they start to really intrude. Glad you’re not put off by choosing function over form!

      1 reply →

I think we need to recognize the limits of this concept. To reach for an analogy, both Dr. Seuss and Tolstoy wrote well but I'd much rather inherit source code that reads like 10 pages of the former over 10 pages of the latter. You could be a genuine code-naming artist but at the end of the day all I want to do is render the damn HTML.

> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.

Additionally, I have found that function names become outdated at about the same rate as comments do. If the common criticism of code commenting is that "comments are code you don't run", function names also fall into that category.

I don't have a universal rule on this, I think that managing code complexity is highly application-dependent, and dependent on the size of the team looking at the code, and dependent on the age of the code, and dependent on how fast the code is being iterated on and rewritten. However, in many cases I've started to find that it makes sense to inline certain logic, because you get rid of the risk of names going out of date just like code comments, and you remove any ambiguity over what the code actually does. There are some other benefits as well, but they're beyond the scope of the current conversation.

Perfect abstractions are relatively rare, so in instances where abstractions are likely to be very leaky (which happens more often than people suspect), it is better to be extremely transparent about what the code is doing, rather than hiding it behind a function name.

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

I'll also push back against this line of thought. The sum total of possible interactions do not decrease when you move code out into a separate function. The same number of lines of code still get run, and each line carries the same potential to have a bug. In fact, in many cases, adding additional interfaces between components and generalizing them can increase the number of code paths and potential failure points.

If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.

> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.

What I've come to understand is that complexity is relative. A solution that makes a codebase less complex for one person in an organization may make a codebase more complex for someone else in the organization who has different responsibilities over the codebase.

If you are building an application with a large team, and there are clear divisions of responsibilities, then functional boundaries are very helpful because they hide the messy details about how low-level parts of the code work.

However, if you are responsible for maintaining both the high-level and low-level parts of the same codebase, than separating that logic can sometimes make the program harder to manage, because you still have to understand how both parts of the codebase work, but now you also have understand how the interfaces and abstractions between them fit together and what their limitations are.

In single-person projects where I'm the only person touching the codebase I do still use abstractions, but I often opt to limit the number of abstractions, and I inline code more often than I would in a larger project. This is because if I'm the only person working on the code, I need to be able to hold almost the entire codebase in my head at the same time in order to make informed architecture decisions, and managing a large number of abstractions on top of their implementations makes the code harder to reason about and increases the number of things I need to remember. This was a hard-learned lesson for me, but has made (I think) an observable difference in the quality and stability of the code I write.

  • >> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.

    > This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.

    Both of these things are not quite right. Yes, if you have to dig into the details of a function to understand what it does, it hasn't been explained well enough. No, the prototype cannot contain enough information to explain it. No, you shouldn't look at the implementation either - that leads to brittle code where you start to rely on the implementation behavior of a function that isn't part of the interface.

    The interface and implementation of a function are separate. The former should be clearly-documented - a descriptive name is good, but you'll almost always also need docstrings/comments/other documentation - while you should rarely rely on details of the latter, because if you are, that usually means that the interface isn't defined clearly enough and/or the abstraction boundaries are in the wrong places (modulo things like looking under the hood to refactor, improve performance, etc - all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).

    > If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.

    This - this is what everyone who advocates for "small functions" doesn't understand.

    • > all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).

      I think this gets back to the old problem of "documentation is code that doesn't run." I'm not saying get rid of documentation -- I comment my code to an almost excessive degree, because I need to be able to remember in the future why I made certain decisions, I need to know what the list of tradeoffs were that went into a decision, I need to know if there are any potential bugs or edge-cases that I haven't tested for yet.

      But what I am saying is that it is uncommon for a interface to be perfectly documented -- not just in code I write, but especially in 3rd-party libraries. It's not super-rare for me to need to dip into library source code to figure out behaviors that they haven't documented, or interfaces that changed between versions and aren't described anywhere. People struggle with good documentation.

      Sometimes that's performance: if a 3rd-party library is slow, sometimes it's because of how it's implemented. I've run into that with d3 addons in the past, where changing how my data is formatted results in large performance gains, and only the implementation logic revealed that to me. Is that a leaky abstraction? Sure, I suppose, but it doesn't seem to be uncommon. Is it fragile? Sure, a bit, but I can't release charts that drop frames whenever they zoom just because I refuse to pay attention to the implementation code.

      So I get what you're saying, but to me "abstractions shouldn't be leaking" is a bit like saying "code shouldn't have bugs", or "minor semvar increases should have no breaking changes." I completely agree, but... it does, and they do. Relying on undocumented behavior is a problem, but sometimes documented behavior diverges from implementation. Sometimes the abstractions are so leaky that you don't have a choice.

      And that's not just a problem with 3rd-party code, because I'm also not a perfect programmer, and sometimes my own documentation on internal methods diverges from my implementation. I try very hard not to have that happen, but I also try hard to compensate for the fact that I'm a human being who makes mistakes. I try to build systems that are less work to maintain and less prone to having their documentation decay over time. I've found that in code that I'm personally writing, it can be useful to sidestep the entire problem and inline the entire abstraction. Then I don't have to worry about fragility at all.

      If you're not introducing a 3rd-party library or a separate interface for every measly 50 lines of code, and instead you just embed your single-use chunk of logic into the original function you want to call it in, then you never have to worry about whether the abstraction is leaky. That can have a tangible effect on the maintainability of your program, because it reduces the number of opportunities you have to mess up an interface or its documentation.

      For perfect abstractions, I agree with you. I'm not saying get rid of all abstractions. I just think that perfect abstractions are more difficult and rarer than people suppose, and sometimes for some kinds of logic, a perfect abstraction might not exist at all.