Comment by hackinthebochs

5 years ago

>Code is not primarily written for other programmers.

I should have said code should be written primarily for other programmers. There are an infinite number of ways to express the same program, and the computer is indifferent to which one it is given. But only a select few are easily understood by another human. Code should be optimized for human readability barring overriding constraints. Granted, in some contexts efficiency is more important than readability down the line. But such contexts are few and far between. Most code does not need to consider the state of the CPU cache, for example.

Joel Spolsky opened my eyes to this issue: Code is read more than it is written. In theory, code is written once (then touched-up for bugs). For 99.9% its life, it is read-only. That is a strong case for writing readable code. I try to write my code so that a junior hire can read and maintain it -- from a technical view. (They might be clueless about the business logic, but that is fine.) Granted, I am not always successful in this goal!

  • Code should be written for debugability, not readability. I don't care if it takes someone 20 minutes to understand my algorithm, if when they understand it bugs become immediately obvious.

    Most simplification added to your code obscures the underlying operations on the silicon. It's like writing a novel so a 5-year-old can read it, versus writing a novel for a 20-year-old. You want to communicate the same ideas? The kid's version is going to be hundreds of times longer. It's going to take longer to write, longer to read, and you're much more likely to make mistakes related to non-local dependencies. In fact, you're going to turn a lot of local dependencies into non-local dependencies.

    Someone who's competent can digest much more complex input, so you can communicate a lot more in one go. Training wheels may make it so anyone can ride your bike but they also limit your ability to compete in, say, the Tour de France.

    Also, this is a side note, but "code is read by programmers" is a bit of a platitude IMO - it's wordplay. Your code is also read by the computer a lot more than it's read by other programmers. Keep your secondary audience in mind, but write for your primary audience.

My point was not just about performance - a lot of bugs come from the introduction of abstractions to increase readability, because the underlying algorithms are obscured. Humans are just not that good at reading algorithms. Transforming operations on silicon into a form we can easily digest requires misrepresenting the problem. Every time you add an abstraction, you increase the degree of misrepresentation. You can argue that's worth it because code is read a lot, but it's still a tradeoff.

But another point worth considering is that a lot of things that make code easier to read make it much harder to rewrite, and they can also make it harder to debug.

  • >Transforming operations on silicon into a form we can easily digest requires misrepresenting the problem.

    Do you have an example, as this is entirely counter to my experience. Of course, you can misrepresent the behavior in words, but then you just used the wrong words to describe what's going on. That's not an indictment of abstraction generally. Abstractions necessarily leave something out, but what is left out is not an assertion of absence. This is not a misrepresentation.

    • Let me try explaining a few ways:

      1. ---

      You don't need to assert absence, the abstraction inherently ignores that which is left out, and the reader remains ignorant of it (that's the point, in fact). The abstraction asserts that the information it captures is the most useful information, and arguably it asserts that it is the only relevant information. This may be correct, but it may also be wrong. If it's wrong, any bugs that result will be hard to solve, because the information necessary to understand how A links to B is deliberately removed in the path from A to B.

      2. ---

      An abstraction is a conceptual reformulation of the problem. Each layer of abstraction reformulates the problem. It's lossy compression. Each layer of abstraction is a lossy compression of a lossy compression. You want to minimise the layers because running the problem through multiple compressors loses a lot information and obscures the constraints of the fundamental problem.

      3. ---

      You don't know a-priori if the information your abstraction leaves out is important.

      I would go further and argue: leaving out the wrong information is usually a disaster, and very hard to reverse. One way to avoid this is to avoid abstractions (not that I'd recommend it, but it's part of the tradeoff).

      4. ---

      Abstractions misrepresent by simplifying. For example, the fundamental problem you're solving is moving electrons through wires. There are specific problems that occur at that level of specificity which you aren't worried about once you introduce the abstraction of the CPU's ISA. For example, bit instability.

      Do those problems disappear at the level of the ISA? No, you've just introduced an abstraction which hides them, and hopefully they don't bubble up. The introduction of that abstraction also added overhead, partly in order to ensure the lower-level problems don't bubble up.

      Ok, let's go up a few levels. You're now using a programming language. One of your fundamental problems here is cache locality. Does your code trigger cache misses? Well, it's not always clear, and it becomes less clear the more layers of abstraction you add.

      "But cache locality rarely matters," ok, but sometimes it does, and if you have 10 layers of abstraction, good luck solving that. Can you properly manage cache locality in Clojure? Not a chance. It's too abstract. What happens when your Clojure code is eventually too slow? You're fucked. The abstraction not only makes the problem hard to identify, it makes it impossible to solve.

      6 replies →