Be Aware of the Makefile Effect

7 months ago (blog.yossarian.net)

"A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system."

– John Gall (1975) Systemantics: How Systems Really Work and How They Fail

https://en.wikipedia.org/wiki/John_Gall_(author)#Gall's_law

  • It's why I'm always very skeptical of new languages and frameworks. They often look great on a PowerPoint slide, but it's not clear how they'll look on something complex and long-lasting.

    They usually pick up warts added for some special case, and that's a sign that there will be infinitely many more.

    There's a fine line between "applying experience" and "designing a whole new system around one pet peeve". But it's a crucial distinction.

    • > "designing a whole new system around one pet peeve"

      BAHHAHAH! So…you mean React. If I hear the word hook as if it alone can solve complexity in web dev one more time I’ll…eh, I’ll do nothing actually. But my point still stands. React solves asynchronous event driven behavior well, but that’s all. Everything else in React projects is, well, everything else.

I have an alternate theory: about 10% of developers can actually start something from scratch because they truly understand how things work (not that they always do it, but they could if needed). Another 40% can get the daily job done by copying and pasting code from local sources, Stack Overflow, GitHub, or an LLM—while kinda knowing what’s going on. That leaves 50% who don’t really know much beyond a few LeetCode puzzles and have no real grasp of what they’re copying and pasting.

Given that distribution, I’d guess that well over 50% of Makefiles are just random chunks of copied and pasted code that kinda work. If they’re lifted from something that already works, job done—next ticket.

I’m not blaming the tools themselves. Makefiles are well-known and not too verbose for smaller projects. They can be a bad choice for a 10,000-file monster—though I’ve seen some cleanly written Makefiles even for huge projects. Personally, it wouldn’t be my first choice. That said, I like Makefiles and have been using them on and off for at least 30 years.

  • > That leaves 50% who don’t really know much beyond a few LeetCode puzzles and have no real grasp of what they’re copying and pasting.

    Small nuance: I think people often don’t know because they don’t have the time to figure it out. There are only so many battles you can fight during a day. For example if I’m a C++ programmer working on a ticket, how many layers of the stack should I know? For example, should I know how the CPU registers are called? And what should an AI researcher working always in Jupyter know? I completely encourage anyone to learn as much about the tools and stack as possible, but there is only so much time.

    • If you spend 80% of your time (and mental energy) applying the knowledge you already have and 20% learning new things, you will very quickly be able to win more battles per day than someone who spends 1% of their time learning new things.

      Specifically for the examples at hand:

      - at 20%, you will be able to write a Makefile from scratch within the first day of picking up the manual, rather than two or three weeks if you only invest 1%.

      - if you don't know what the CPU registers are, the debugger won't be able to tell you why your C++ program dumped core, which will typically enable you to resolve the ticket in a few minutes (because most segfaults are stupid problems that are easy to fix when you see what the problem is, though the memorable ones are much hairier.) Without knowing how to use the disassembly in the debugger, you're often stuck debugging by printf or even binary search, incrementally tweaking the program until it stops crashing, incurring a dog-slow C++ build after every tweak. As often as not, a fix thus empirically derived will merely conceal the symptom of the bug, so you end up fixing it two or three times, taking several hours each time.

      Sometimes the source-level debugger works well enough that you can just print out C++-level variable values, but often it doesn't, especially in release builds. And for performance regression tickets, reading disassembly is even more valuable.

      (In C#, managed C++, or Python, the story is of course different. Until the Python interpreter is segfaulting.)

      How long does it take to learn enough assembly to use the debugger effectively on C and C++ programs? Tens of hours, I think, not hundreds. At 20% you get there after a few dozen day-long debugging sessions, maybe a month or two. At 1% you may take years.

      What's disturbing is how many programmers never get there. What's wrong with them? I don't understand it.

      17 replies →

    • > I completely encourage anyone to learn as much about the tools and stack as possible, but there is only so much time.

      That seems like a weird way to think about this. I mean, sure, there's no time today to learn make to complete your C++ ticket or whatever. But yesterday? Last month? Last job?

      Basically, I think this matches the upthread contention perfectly. If you're a working C++ programmer who's failed to learn the Normal Stable of Related Tools (make, bash, python, yada yada) across a ~decade of education and experience, you probably never will. You're in that 50% of developers who can't start stuff from scratch. It's not a problem of time, but of curiosity.

      8 replies →

    • This is the 40% that OP mentioned. But there's a proportion on people/engineers that are just clueless and are incapable of understanding code. I don't know the proportion so can't comment on the 50% number, but hey definitely exist.

      If you never worked with them, you should count yourself lucky.

    • We can’t really call the field engineering if this is the standard. A fundamental understanding of what one’s code actually makes the machine do is necessary to write quality code regardless of how high up the abstraction stack it is

      25 replies →

    • Funny enough I'm an A̶I̶ML researcher and started in HPC (High Performance Computing).

        >  if I’m a C++ programmer ... should I know how the CPU registers are called?
      

      Probably.

      Especially with "low level"[0] languages knowing some basics about CPU operations goes a long way. You can definitely get away without knowing these things but this knowledge will reap rewards. This is true for a lot of system based information too. You should definitely know about things like SIMD, MIMD, etc because if you're writing anything in C/C++ these days it should be because you care a lot about performance. There's a lot of stuff that should be parallelized that isn't. Even stuff that could be trivially parallelized with OpenMP.

        > what should an AI researcher working always in Jupyter know?
      

      Depends on what they're researching. But I do wish a lot more knew some OS basics. I see lots of things in papers where they're like "we got 10x" performance on some speed measurement but didn't actually measure it correctly (e.g. you can't use time.time and be accurate because there's lots of asynchronous operations). There's lots of easy pitfalls here that are not at all obvious and will look like they are working correctly. There's things about GPUs that should be known. Things about math and statistics. Things about networking. But this is a broad field so there are of course lots of answers here. I'd at least say anyone working on AI should read at least some text on cognitive science and neuroscience because that's a super common pitfall too.

      I think it is easy to not recognize that information is helpful until after you have that information. So it becomes easy to put off as not important. You are right that it is really difficult to balance everything though but I'm not convinced this is the problem with those in that category of programmers. There's quite a number of people who insist that they "don't need to" learn things or insist certain knowledge isn't useful based on their "success."

      IMO the key point is that you should always be improving. Easier said than done, but it should be the goal. At worst, I think we should push back on anyone insisting that we shouldn't be (I do not think you're suggesting this).

      [0] Quotes because depends who you talk to. C++ historically was considered a high level language but then what is Python, Lua, etc?

  • Actually it is trivial to write a very simple Makefile for a 10,000 file project, despite the fact that almost all Makefiles that I have ever seen in open-source projects are ridiculously complicated, far more complicated than a good Makefile would be.

    In my opinion, it is a mistake almost always when you see in a Makefile an individual rule for making a single file.

    Normally, there should be only generic building rules that should be used for building any file of a given type.

    A Makefile should almost never contain lists of source files or of their dependencies. It should contain only a list with the directories where the source files are located.

    Make should search the source directories, find the source files, classify them by type, create their dependency lists and invoke appropriate building rules. At least with GNU make, this is very simple and described in its user manual.

    If you write a Makefile like this, it does not matter whether a project has 1 file or 10,000 files, the effort in creating or modifying the Makefile is equally negligible. Moreover, there is no need to update the Makefile whenever source files are created, renamed, moved or deleted.

    • If everything in your tree is similar, yes. I agree that's going to be a very small Makefile.

      While this is true, for much larger projects, that have lived for a long time, you will have many parts, all with slight differences. For example, over time the language flavour of the day comes and goes. Structure changes in new code. Often different subtrees are there for different platforms or environments.

      The Linux kernel is a good, maybe extreme, but clear example. There are hundreds of Makefiles.

      2 replies →

    • Sure, but this will require you to know how to tell the compiler to generate your Makefile header dependencies and if you end up making a mistake, this will cause silent failures.

  • I like Makefiles, but just for me. Each time I create a new personal project, I add a Makefile at the root, even if the only target is the most basic of the corresponding language. This is because I can't remember all the variations of all the languages and frameworks build "sequences". But "$ make" is easy.

  • I would just change the percentages, but is about as true as it gets.

    • I’d be curious to hear your ratio. It really varies. In some small teams with talented people, there are hardly any “fake” developers. But in larger companies, they can make up a huge chunk.

      Where I am now, it’s easily over 50%, and most of the real developers have already left.

      PS: The fakes aren’t always juniors. Sometimes you have junior folks who are actually really good—they just haven’t had time yet to discover what they don’t know. It’s often absolutely clear that certain juniors will be very good just from a small contribution.

      2 replies →

  • Being able to set up things and truly understanding how they work are quite different imo.

    I agree with the idea that a lot of productive app developers would not be able to set up a new project ex novo but often it is not about particularly true understanding but rather knowing the correct set of magic rules and incantations to make many tools work well together

  • > They can be a bad choice for a 10,000-file monster

    Whether they are a bad choice really depends on what are the alternatives though

  • > That leaves 50% who don’t really know much beyond a few LeetCode puzzles and have no real grasp of what they’re copying and pasting.

    Who likely wouldn't have a job if it weren't for LLMs.

At my work I've noticed another contributing factor: tools/systems that devs need to interact with at some point, but otherwise provide little perceived value to learn day-to-day.

Example is build system and CI configuration. We absolutely need these but devs don't think they should be expected to deal with them day to day. CI is perceived as a system that should be "set and forget", like yeah we need it but really I have to learn all this just to build the app? Devs expect it to "just work" and if there are complexities then another team (AKA my role) deals with that. As a result, any time devs interact with the system, there's a high motivation to copy from the last working setup and move on with their day to the "real" work.

The best solution I see is meet the devs halfway. Provide them with tooling that is appropriate simple/complex for the task, provide documentation, minimise belief in "magic". Tools like Make kinda fail here because they are too complex and black-box-like.

  • For me the big problems with CI setups tend to be:

    - They're often slow

    - They're often proprietary

    - They're often dealing with secrets which limits who can work on them

    - You generally can't run them locally

    So the feedback cycle for working on them is incredibly long. And working on them is therefore a massive pain.

    • > You generally can't run them locally

      I recognize that this is such a disincentive for me taking the initiative to fiddle with and learn about anything like this

      1 reply →

    • The local part is my big problem too. I used azure Dev ops in work. I find clicking through the UI to be a miserable experience, Id love to have it running locally so I could view inputs and outputs on the file system. Also yaml is an awful choice, no one I know enjoys working with it. The white space issues just get worse and worse longer your files get.

  • Strong agree. The best workflow I've seen uses CICD as a very thin wrapper around in-tree scripts or make files.

    If a Dev can run some/all of the "cicd" stuff locally, they can see, control, and understand it. It helps tremendously to have a sense of ownership and calm, vs "cicd is something else, la la la".

    (This doesn't always work. We had a team of two devs, who had thin-wrapper CICD, who pretended it was an alien process and refused to touch it. Weird.)

    • +1. The only CI tool that I've seen really organize around this principle is Buildkite, which I've used and enjoyed. I'm currently using Github Actions and it's fine but Buildkite is literally sooooo good for the reasons you've mentioned.

  • The office coffee machine is not „set and forget”, but you wouldn’t expect the entire responsibility for it’s maintenance to be evenly distributed between all people that use it. Similarly, CI needs ownership and having it fall on the last developer that attempted to use it is not an efficient way of working.

  • Make is one of the simplest build tools out there. Compared to something like Grunt, Webpack, etc. it’s a hammer compared to a mining drill.

    The solution is to not use tools used by large corporations because they are used by large corporations. My unpopular opinion is that CI/CD is not needed in most places where it’s used. Figure out how to do your builds and deploys with the absolute fewest moving pieces even if it involves some extra steps. Then carefully consider the cost of streamlining any part of it. Buying into a large system just to do a simple thing is often times not worth it in the long run.

    If you really do need CI/CD you will know because you will have a pain point. If that system is causing your developers pain, it isn’t the right fit.

  • If you think `make` is “too complex and black-box-like” then you haven't seen `cmake`.

    • If you think cmake is a good example of more complex than make, then you haven't seen automake/autoconf. The first thing I thought of. You can find tons of tons of configure scripts that check if you're running ancient versions of Unix, checks that a byte is 8 bits wide, and a ton of other pointless checks. They don't do anything with all that information, don't think for a moment that you can actually build the app on Irix, but the checks for it have been passed along for decades likes junk DNA.

      3 replies →

  • Yeah, I think this is the real issue. Too many different tool types that need to interact, so you don't get a chance to get deep knowledge in any of them. If only every piece of software/CI/build/webapp/phone-app/OS was fully implemented in GNU make ;-) There's a tension between using the best tool for the job vs adding yet another tool/dependency.

Make and Makefiles are incredibly simple when they are not autogenerated by autoconf. If they are generated by autoconf, don’t modify them, they are a build artifact. But also, ditch autoconf if you can.

In the broader sense: yes this effect is very real. You can fall to it or you can exploit it. How I exploit it: write a bit of code (or copy/paste it from somewhere). Use it in a project. Refine as needed. When starting the next project, copy that bit of code in. Modify for the second project. See if changes can be backported to the original project. Once both are running and are in sync, extract the bit of code and make it into a library. Sometimes this takes more projects to distill the thing into what a library should be. In the best case, open source the library so others can use it.

  • They are also extremely limited. Timestamp-based freshness is often broken by modern VCSes. Git doesn’t record timestamps internally, so files can (and often do) have their mtime updated even when their contents are the same, causing unnecessary rebuilds.

    They also are utterly unable to handle many modern tools whose inputs and/or outputs are entire directories or whose output names are not knowable in advance of running the tool.

    I love make. I have put it to good use in spite of its shortcomings and know all the workarounds for them, and the workarounds for the workarounds, and the workarounds for those workarounds. Making a correct Makefile when you end up with tools that don’t perfectly fit into its expectations escalates rapidly in difficulty and complexity.

    • I started using ccache to speed up Make, but soon found that allowed me to replace Make entirely with a bash script using a few functions.

  • They are simple but very often wrong. It's surprisingly hard to write Makefiles that will actually do the right thing under anything other than "build from scratch" scenarios. (No, I'm not joking. The very existence of the idea of "make clean" is the smoking gun.)

    • I disagree, but I think once a project gets beyond a certain level of complexity you may need to move beyond make. For simple projects though I usually do something like:

          CC=clang
          MODULES=gtk+-3.0 json-glib-1.0
          CFLAGS=-Wall -pedantic --std=gnu17 `pkg-config --cflags $(MODULES)`
          LDLIBS=`pkg-config --libs $(MODULES)`
          HEADERS=*.h
          EXE=app
          
          ALL: $(EXE)
      
          $(EXE): application.o jsonstuff.o otherstuff.o
      
          application.o: application.c $(HEADERS)
      
          jsonstuff.o: jsonstuff.c $(HEADERS)
      
          otherstuff.o: otherstuff.c $(HEADERS)
      
          clean:
                  rm -f $(EXE) *.o
      
      

      This isn't perfect as it causes a full project rebuild whenever a header is updated, but I've found it's easier to do this than to try to track header usage in files. Also, failing to rebuild something when a header updates is a quick way to drive yourself crazy in C, it's better to be conservative. It's easy enough that you can write it from memory in a minute or two and pretty flexible. There are no unit tests, no downloading and building of external resources, or anything fancy like that. Just basic make. It does parallelize if you pass -j to make.

      2 replies →

    • I use makefiles all the time for my projects; projects that are actually built with something else (ex, gradle, maven, whatever). My makefiles have targets for build, clean, dependencies, and a variety of other things. And they also have inputs (like "NOTEST=true") for altering how they run. And then I use make to actually build the project; so I don't need to remember how the specific build tool for _this_ project (or the one of many build tools in a project) happens to work. It works pretty well.

    • The idea that git offers a 'clean' command was revelatory to me. Your build system probably shouldn't need to know how to restore your environment to a clean state because your source control should already know what a clean state is.

      That's sort essential to serving its purpose, after all.

      I haven't yet run into a scenario where there was a clean task that couldn't be accomplished by using flags to git clean, usually -dfx[0]. If someone has an example of something complex enough to require a separate target in the build system, I'm all ears.

      [0] git is my Makefile effect program. I do not know it well, and have not invested the time to learn it. This says something about me, got, or both.

      8 replies →

    • That’s why I usually write them from scratch and don’t let them get over 100 lines long at most. Usually they are around 30 with white space.

      make clean makes lots of sense but is not even strictly necessary. In the world where all it does is find all the *.o files and deletes them it’s not a bad thing at all.

I think Makefile is maybe the wrong analogy - the problem with most people and makefiles is they write so few of them, the general idea of what make does is at hand, but the muscle memory of how to do it from scratch is not.

But, point taken - I've seen so much code copy-pasta'd from the web, there will be like a bunch of dead stuff in it that's actually not used. A good practice here is to keep deleting stuff until you break it, then put whatever that was back... And delete as much as possible - certainly everything you're not using at the moment.

  • This is exactly the problem I face with many tools, Makefiles, KVM setups, docker configurations, CI/CD pipelines. My solution so far has been to create a separate repository with all my notes, shell script example programs etc, for these tool, libraries or frameworks. Every time I have to use these tools, I refer to my notes to refresh my memory, and if I learn something new in the process, I update the notes. I can even point an LLM at it now and ask it questions.

    The repository is personal, and contains info on tools that are publicly available.

    I keep organisation specific knowledge in a similar but separate repo, which I discard when my tenure with a client or employer ends.

    • What if your client comes back?

      On a more practical note, what structure, formats and tools do you use that enable you to feed it to an LLM?

      1 reply →

The best term for this is Cargo Cult Development. Cargo Cults arose in the Pacific during World War II, where native islanders would see miraculous planes bringing food, alcohol and goods to the islands and then vanishing into the blue. The islanders copied what they saw the soldiers doing, praying that their bamboo planes and coconut gadgets would impress the gods and restart the flow of cargo to the area.

The issue of course is the islanders did not understand the science behind planes, Wallis talkies, guns, etc.

Likewise, cargo cult devs see what is possible, but do not understand first principles, so they mimic what they see their high priests of technology doing, hoping they can copy their success.

Hence the practice of copying, pasting, trying, fiddling, googling, tugging, pulling and tweaking hoping that this time it will be just right enough to kind of work. Badly, and only with certain data on a Tuesday evening.

  • I don't think of this as being cargo cult development. Cargo culting has more to do with mimicking practices that have worked before without understanding that they only worked within a broader context that is now missing. It's about going through motions or rituals that are actually ineffective on their own in the hopes that you'll get the results that other companies got who also happened to perform those same motions or rituals.

    What OP is describing isn't like this because the thing being copied—the code—actually is effectual in its own right. You can test it and decide whether it works or not.

    The distinction matters because the symptoms of what OP calls the Makefile effect are different than the symptoms of cargo culting, so treating them as the same thing will make diagnosis harder. With cargo culting you're wasting time doing things that actually don't work out of superstition. With the Makefile effect things will work, provably so, but the code will become gradually harder and harder to maintain as vestigial bits get copied.

    • I would almost call this the "boilerplate effect".

      Where people copy the giant boilerplate projects for React, K8, Terraform, etc. and go from there. Those boilerplates are ideal for mid to large scale projects. And it's likely you'll need them someday. But in the early stages of development it's going to impart a lot of architecture decisions that really aren't necessary.

      1 reply →

    • Seeing this exact effect where I am currently working. Main available CI/CD tool is a customised and centrally managed Jenkins fleet. It's pretty much impossible to avoid using and seldom needs changed - until it does. Some attempts have been made at centralised libraries and patterns - but even that requires knowledge and study that most won't know is available or be given time to acquire.

      So when the inevitable tweak or change is made it's made in the easiest, cheapest way - which is usually copying an existing example, which itself was copied from somewhere else.

      I see exactly the same in other teams repositories. Easiest path taken to patch what already exists as the cost/benefit just isn't perceived to be there to worth prioritising.

    •   > only worked within a broader context that is now missing
      
        > because the thing being copied—the code—actually is effectual in its own right.
      

      I don't understand how the second disproves the former.

      In fact, a cargo cult works because there's the appearance of a casual linkage. It appears things work. But as we know with code, just because it compiles and runs doesn't mean "it works". It's not a binary thing. Personal I find that belief is at the root of a lot of cargo cult development. Where many programmers glue things together say "it works" because they passed some test cases but in reality code shouldn't be a Lovecraftian monster made of spaghetti and duct tape. Just because your wooden plane glides doesn't mean it's AC an actual plane

      11 replies →

  • (Author of the post.)

    This is mentioned in footnote 1. Concretely, I don’t think this is exactly the same thing as cargo culting, because cargo culting implies a lack of understanding. It’s possible to understand a system well and still largely subsist on copy-pasting, because that’s what the system’s innate complexity incentivizes. That was the underlying point of the post.

  • For me, there are many cases where I copy-paste stuff I've written in the past b/c some tool is a pain-in-the-ass and I can't afford the mental context switch. I usually do understand what's happening under the hood, but it's still cognitively heavy to switch into that "mode" so I avoid it when possible.

    Tools that fall into this category are usually ops-y things with enormous complexity but are not "core" to the problem I'm solving, like CI/CD, k8s, Docker, etc. For Make specifically, I usually just avoid it at this point b/c I find it hard to avoid the context switch.

    It has nothing to do with miraculous incantations--I know the tradeoff I'm making. But it still runs the risk of turning into the Makefile Effect.

    • It’s always hoped (but rarely shown to be true) that by making templates, teams will put thought into their K8s deployments etc. instead of just copy/pasting. Alas, no – even when the only things the devs have to do is add resource requests and limits, those are invariably copy/pasted. If the app gets OOMkilled, they bump up memory limit until it doesn’t. If it’s never OOMkilled, it’s probably never touched, even if it’s heavily over-provisioned (though that would only matter for the request, of course).

      This has spawned a cottage industry of right-sizing tooling, which does what a dev team could and should have done to begin with: profiling their code to see resource requirements.

      At this point, I feel like continuing to make things easier is detrimental. I certainly don’t think devs need to know how to administer K8s, but I do firmly believe one should know how to profile one’s code, and to make reasonable decisions based on that effort.

      1 reply →

    • I agree, and I think the key distinction is in understanding. In a cargo cult there's a lack of understanding, whereas I'll often copy and paste code/config I understand to get something done. Usually this is for something I don't do very often (configuring nginx, writing some slightly complicated shell script etc.) I could spend an hour reading docs and writing the thing from scratch but that's likely gonna be wasted time because there's a good chance Im not going to look at that thing again for a few years.

    • Pretty much this

      And of course every one of those tools has to have their own special language/syntax that makes sense nowhere else (think of all the tools beyond make, like autotools, etc)

      I don't care about make. I don't care learning about make beyond what's needed for my job

      Sure, it's a great tool, but I literally have 10 other things that deserve more of my attention than having my makefile work as needed

      So yeah I'll copy/paste and be done with it

      1 reply →

  • The article does kind of mention this in footnote '1', for what it's worth:

    > The Makefile effect resembles other phenomena, like cargo culting, normalization of deviance, “write-only language,” &c. I’ll argue in this post that it’s a little different from each of these, insofar as it’s not inherently ineffective or bad and concerns the outcome of specific designs.

  • I think I fall very much into the "beginner of beginner stages" of understanding programming. It sounds like then, if I want to avoid that "cargo cult" mindset, then a structured flow of:

    education -> learning -> doing -> failing -> (repeat)

    Would be needed then, right?

    Does this then mean that, if someone truly wants to "escape the island, and fly the plane" as it were, it comes down to "university is the 'truest' way"?

    Note: Yes, I realize it's hard to speak in absolutes, that there are plenty of exceptions to generalities, and that all people have various degrees of justifications of I-can't-do-that-itus; I'm talking more in terms of optimal theory. That, the optimal route to avoid cult-like behavior is to understand the whole thing, and that "the whole thing" comes from higher education, right?

    Logically at least, it would seem that even diligent studying with books as a means to meet/surpass the "completeness" of university would still be... inadequate in some regard when compared to in-class time with learned educators. (Again, supposing that the same person worked just as hard doing either option, etc.)

  • An engineer should learn first principles and master the tool rather than dancing around it or reach immediately for replacing it with something else. This is why the "replacement" tool "just" is fundamentally terrible because it doesn't do dependency checking and optimizes for the wrong things. Instant loss of efficiency throwing away the power and simplicity of makefiles (GNU extensions often needed).

    Instead, (GNU or vanilla) makefiles are ideals for very simple, portable projects. Make is everywhere.

    For anything complicated, a proper build system that doesn't use autotools like cmake or bazel.

Another factor is frequency of use. I use LaTeX to do big write-ups on the order of once per year or less. LaTeX at the level I use it is not a hard tool, but I generally start a new document by copy-pasting a previous document because there is a lot of detail about how to use it that I'm never going to remember given that I only use it for a few weeks once a year.

  • I usually try to avoid the "makefile effect" by learning the technolgoy I use reasonably frequently (like e.g. Makefiles, Shell Scripts, ...).

    However, despite the fact that I used to use LaTeX very much, I always copy-pasted from a template. It is even worse with beamer presentations and TikZ pictures where I would copy-paste from a previous presentation or picture rather than a template.

    For TikZ I am pretty sure that the tool is inherently complex and I just haven't spent enough time to learn it properly.

    For LaTeX I have certainly spent enough time on learning it so I wonder whether it might be something different.

    In my opinion it could very well be a matter of “(in)sane defaults”. Good tools should come with good defaults. However, LaTeX is not a good tool wrt. this metric, because basically all my documents start something like

    ~~~ \documentclass[paper=a4, DIV9, 12pt, abstracton, headings=normal, captions=tableheading]{scrartcl} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage[english,german,ngerman]{babel} \usepackage[english,german]{fancyref} % ... \usepackage{microtype} \usepackage{hyperref} ~~~

    Most of this is to get some basic non-ASCII support that is needed for my native tongue or enable some sane defaults (A4 paper, microtype...) which in a modern tool like e.g. pandoc/markdown may not be needed...

    Hence the purpose of copy-pasing the stuff around is often to get good defaults which a better tool might give you right out of the box (then without copy/paste).

    • Copy-pasting itself is not bad per se. What's bad is copy-pasting without understanding the why and how.

      For LaTeX I also copy-paste a whole lot from older files, but I don't feel bad because (a) I wrote these files before, (b) I know exactly what each line is doing, (c) I understand why each line is needed in the new doc.

      I wrote a relatively large amount of TikZ code earlier in my life (basically used it as a substitute for Illustrator) and for this library in particular, I think it just has so much syntax to remember that I cannot keep it all in my brain for ever. So I gladly copy from my old TikZ code.

    • \usepackage[utf8]{inputenc} now is the default, at least; you don't need to include it anymore. And diacritics work out of the box, no need to write weird incantations like G\"{o}del anymore.

  • I use it more often and also start with copy-paste header, that includes:

    * all packages needed for my language (fontenc, babel, local typography package) * typical graphicx/fancyhdr/hyperref/geometry packages that are almost always needed * a set of useful symbol and name definitions for my field

    If you are not writing math or pure text in English only LaTeX is batteries not included.

This is “Copy-Pasta Driven Development” [0] and it’s not even related to makefiles. It’s related to the entire industry copying code from here to there without even knowing what they are copying.

TBH I think copilot has made this even worse, as we are blindly accepting chucks of code into our code bases.

[0] https://andrew.grahamyooll.com/blog/copy-pasta-driven-develo...

  • Blame the business people. I tried becoming an expert in `make` probably at least 7 times in a row, was never given time to work with it daily until I fully memorized it.

    At one point I simply gave up; you can never build the muscle memory and it becomes a cryptic arcane knowledge you have to relearn from scratch every time you need it. So I moved to simpler tools.

    The loss of deep work is not the good programmers' fault. It's the fault of the business people.

    • I wouldn't say so. Make is very simple and you can grasp the basis within an hour or so, if you're familiar with shell scripting (as it's basically a superset of shell scripts, with the dependency graph on top). Then all you have to do is just in time learning which is mostly searching for a simpler pattern that what you're currently doing.

      1 reply →

If I had a nickel for every time I have seen a Makefile straight up copied from other projects and modified to "work" while leaving completely unrelated unnecessary build steps and targets in place.

It's a major pet peeve of mine.

  • How do you know what is and isn't related if nothing is documented?

    Trial and error?

    Well have fun with that :p

    • Hum...

      You know, a makefile is documentation. That's why you should probably never copy one (except for a single line here or there). There's space for commenting a few stuff, but your target names and variables should explain most of what is going there.

      Anyway, the article and most people here seem to be talking about those autotools generated files. Or hand-built ones that look the same way. But either way, it's a bad solution caused by forcing a problem to be solved by a tool that wasn't aimed at solving it. We have some older languages without the concept of a "project" that need a lot of hand-holding for compiling, but despite make being intentionally created for that hand-holding, it's clearly not the best tool for that one task.

    • Exactly. Bonus points if the person who started the project moved on and you have to be the one to build and maintain it.

    • You find the first part in your stack that is documented (e.g., make is documented, even if your makefile is not) and use that documentation to understand the undocumented part. You then write down your findings for the next person.

      If you don’t have enough time, write down whatever pieces you understood, and write down what parts “seem to work, but you don’t understand“ to help make progress towards better documentation.

      If you put the documentation as comments into the file, this can make copy&pasting working examples into a reasonably solid process.

There are certainly a lot of tools that are more complicated than necessary, but Make as a tool isn’t a good example of that, IMO. With modern tooling, more often than not the complexity problem is compounded by insufficient documentation, the existing documentation being predominantly cookbook-style and not explaining the conceptual models needed to reason about how the tool works, nor providing a detailed and precise enough specification of the tool. That isn’t the case for Make, which is well-documented and not difficult to get a good grasp on, if one only takes the time to actually read the documentation.

The cookbook orientation mentioned above in turn leads to a culture that underemphasizes the importance of learning and understanding the tools that one is using, and of having thorough documentation that facilitates that. Or maybe the direction of causation is the other way around. In any case, I see the problem more in too little time being spent in creating comprehensive and up-to-date documentation on tooling (and designing the tooling to be amenable to that in the first place), and in too little resources being allocated to teaching and learning the necessary tooling.

I wouldn't say this is necessarily a bad thing. I wrote my first version of a Makefile with automatic dependencies and out-of-tree builds 10+ years ago and I have been copying and improving it since. I do try to remove unneeded stuff when possible.

The advantage is that one can go in and modify any aspect of build process easily, provided one takes care to remove cruft so that the Makefile does not become huge. This is very important for embedded projects. For me, the advantages have surpassed the drawbacks (which I admit are quite a few).

You could, in theory, abstract much of this common functionality away in a library (whether for Make or any other software), however properly encapsulating the functionality is additional work, and Make does not have great built-in support for modularization.

In this sense I would not say Make is overly complex but rather the opposite, too simple. Imagine how it would be if in C global variables were visible across translation units. So, in a way, the "Makefile effect" is in part due to the nature of the problem being solved and part due to limitations in Make.

  • Can you imagine the makefile was made by someone else and you are now suddenly confronted with the result of 10 years of tuning.

    • I am that someone else because I seldom edit the makefiles and I forget things. That's why I try to trim unused targets and recipes and I try to keep it documented.

      In the end it is no different from any code that's suffered from 10 years of tuning and it can get ugly. Maybe Make is even somewhat worse in this respect, but then again it does not need to be changed often.

Is not this a very generic phenomenon? I would argue it applies broadly. For example budgeting, you usually start from last year's budget and tweak that, rather than start from scratch. Or when you write an application letter, or a ServiceNow ticket, or whatever. Now I regret that I have brought in ServiceNow in the discussion, it kills the good mood....

I have observed the Makefile effect many times for LaTeX documents. Most researchers I worked with had a LaTeX file full of macros that they have been carrying from project to project for years. These were often inherited from more senior researchers, and were hammered into heavily-modified forks of article templates used in their field or thesis templates used at their institution.

  • This is a great example of an instance of this "Makefile effect" with a possible solution: use Markdown and Pandoc where possible. This won't work in every situation, but sometimes one can compose a basic Beamer presentation or LaTeX paper quickly using largely simple TeX and the same Markdown syntax you already know from GitHub and Reddit.

    • > use Markdown and Pandoc where possible.

      That won’t solve any problem that LaTeX macros solve. Boilerplate in LaTeX has 2 purposes.

      The first is to factor frequently-used complex notations. To do this in Markdown you’d need to bolt on a macro preprocessor on top of Markdown.

      The second one is to fine-tune typography and layout details (tables are a big offender). This is something that simply cannot be done in Markdown. A table is a table and if you don’t like the style (which is most of the time inadequate) then there is no solution.

I have made conscious effort in the past to never copy/paste the initial fleshing-out of a Makefile or a PHP class, or HTML boilerplate, or whatever. Like, for years I stuck to that. Then I stopped making that effort because there is no upside. Or rather, there is no downside to copy+paste+modify. It's faster and you save your brain power for things that actually matter.

  • There's a subtle difference between a nice template and a fully-working implementation that you then modify though.

    (e.g. in that they were designed with different goals in mind, so the former is likely to have stopped at the point where it was general enough, to save you time, but not too specific to create footguns).

    Bonus points if your template explicitly has fail patterns that prevent your code from silently failing.

> However, the occurrence of the Makefile effect in a simple application suggests that the tool is too complicated for that application.

I interpret it in a bit of different way.

Makefile is relatively simple and unopinionated like a brick. Also makefile defines/reflects project’s structure.

From simple blocks one can build any shape one want. Total freedom.

Problem is, make doesn’t impose best practice and doesn’t steer you clear of common pitfalls of project structuring and building and publishing.

One example for illustration: Out of source builds is rather good idea, but not imposed by make.

So makefile is not enough, one needs all the life-lessons of using make, so inherited makefiles are better than written from scratch.

  • To expand the illustration:

    * using out-of-source builds is a good idea

    * using fully automatic dependencies is a good idea

    * never committing generated files is a good idea (avoid hysteresis)

    It is fundamentally very difficult to get all three of these at once; automatic dependencies often require generating files ahead of time, but generating files often involves needing to know the dependencies or at least their paths ahead of time.

    These days the trend seems to be to commit "generated-in-place" files, which avoids some of the historical problems with the last (at the cost of introducing others). I don't claim this is optimal.

I think LaTeX is the poster child of this. Nobody writes a LaTeX preamble from scratch, you always copy your previous document and tweak it.

  • Don't do that! If you're always using the same preamble, you should turn it into a .sty file. Then the preamble of new documents is just

      \usepackage{myessay}

    • I did that, then I needed to tweak things so I added options, then I needed to use the package somewhere that needed to be self-contained, so I started copy-pasting ;). I've done similar things with makefiles, tox configs, linter settings (all of which started from an initial version I wrote from scratch).

      I suspect the real reason this effect exists is because there's copy-pasting is the best way to solve the problem, due to a varying mix of: there being no way of managing the dependencies, needing to avoid (unmanaged) dependencies (i.e. vendoring is the same, only we have a tool managing it), the file (or its contents) needing to exist there specifically (e.g. the various CI locations) and no real agreement on what template/templating tool to use (and a template is just as likely to include useless junk). Copy-pasting is viewed as a one-time cost, and the thing copy-pasted isn't expected to change all that much.

      1 reply →

    • Then you make a change to myessay.sty and suddenly four of your old files stop compiling.

  • I use Typst now instead and wrote the "preamble" from scratch. (Because Typst is that much less annoying than LaTeX)

  • I guess that there's a very important difference between copying something that you understand (or at least the details of which, like syntax, you can easily remember - here comments become important),

    and copying something that not only you do not understand, but you were not the one that made it in the first place, and you never understood it !

Okey but to me, copying - pasting working code (even with sone extra unused bits) really looks no more different than inheriting a library - provided base class, and then extending it to one's needs.

That's literally the basis of all software. There is no need to invent "a Makefile effect/syndrome"

Yes that's an indication that a code sharing mechanism is needed but not implemented. Copying pasting solves that. You don't expect people to rewrite http client for every project which interacts with APIs, so you?

  • I think this is a good point. As somewhat of a tangent I have vaguely been thinking of the difference between copy pasting and explicitly extending for a bit.

    It seems that in many cases, adapting copy pasted code has some benefits over importing and adjusting some library code. https://ui.shadcn.com/ is an example of going the copy paste direction. It seems to me this is preferable when tweaking the exact behaviour is more important than keeping up to date with upstream or adhering to an exact standard. If you customize the behaviour a lot the extra abstraction layer only gets in the way.

    This insight might be a bit mundane. But I remember myself bending over backwards a bit too much trying to reuse when copy pasting is fine.

  • Well, I expect people to understand http clients and if things don't work to be sufficiently knowledgeable to recognize when they have a performance problem and figure out why they have it. For that one needs language, library and networking skills which to a degree most developers have because they do it every day.

    At issue however are niche skills. We are dealing with the long tail of a distribution and heuristics which work most of the time might not - the author mentions e.g. security. The way I look at this is risk i.e. security, bus factor, disruptions due to software moving from state "works and is not understood" to "broken and is not understood" and last but not least ability to predict behavior of this niche technology when it is going to be pushed into an larger project.

> the tool (or system) is too complicated (or annoying) to use from scratch.

Or boring: some systems require boilerplate with no added value. It's normal to copy & paste from previous works.

Makefiles are a good example. Every makefile author must write their own functionally identical "clean" target. Shouldn't there be an implicit default?

C is not immune, either. How many bits of interesting information do you spot in the following excerpt?

  #include <stdio.h>
  int main(int argc, char **argv)
  {
    printf("Hello\n");
    return 0;
  }

The printf alone is the real payload, the rest conveys no information. (Suggestion for compiler authors: since the programs that include stdio.h outnumber those that don't, wouldn't it be saner for a compiler to automatically do it for us, and accept a flag to not do it in those rare cases where we want to deviate?)

  • > since the programs that include stdio.h outnumber those that don't

    I don't think that is true. There is a lot of embedded systems C out there, plus there are a lot of files in most projects, and include is per file not per project. The project might use stdio in a few files, and not use it in many others.

  • > Makefiles are a good example. Every makefile author must write their own functionally identical "clean" target. Shouldn't there be an implicit default?

    At some point you have to give the system something to go on, and the part where it starts deleting files seems like a good one where not to guess.

    It's plenty implicit in other places. You can for example, without a Makefile even, just do `make foo` and it will do its best to figure out how to do that. If there's a foo.c you'll get a `foo` executable from that with the default settings.

  • more implicit behaviors more surprises, like security bugs because default functionality or conversions happen

  • > The printf alone is the real payload, the rest conveys no information.

    What are you talking about? Every line is important.

        #include <stdio.h>
    

    This means you need IO in your program. C is a general purpose language , it shouldn't include that unless asked for. You could claim it should include stuff by default, but that would go completely against what C stands for. Code shouldn't have to depend on knowing which flags you need to use to compile successfully (at least not in general like this).

        int main(int argc, char** argv)
    

    Every program requires a main function. Scripting languages pretend they don't, but they just wrap all top-level code in one. Having that be explicit, again, is important for a low level language like C. By the way, the C standard lets you declare it in a simplified manner:

        int main(void)
    

    Let's ignore the braces as you could just place them on the same line.

        printf("Hello\n");
    

    You could just use `puts` here, but apart from that, yeah that's the main payload, cool.

        return 0;
    

    The C standard actually makes this line optional. Funny but I guess it addresses your complaint that "common stuff" perhaps should not be spelled out all the time?

    So, here is the actual minimalist Hello world:

        #include <stdio.h>
        int main(void) {
            puts("Hello world\n");
        }

    • Thank you, but this thread was not about writing good code, but rather how often one ends up acritically copying existing "legacy" parts without even attempting to understand it.

      I probably used the wrong words: "conveys no information" was meant as "is less meaningful than the printf". Just like switching on the PC every morning is essential, but if you ask me what my job's about, I wouldn't mention it.

      In the same vein, I'm convinced that the printf is the part that expresses the goal of the program. The rest, the #include, the main(), even with the optimizations that you suggested, is just boilerplate, the part that is usually copied and pasted, not because it's not useful and not because it's too difficult to get right, as the original article claims, but because it's boring.

  • My main frustration with make has usually been with too much (surprising) implicit default behavior, not the lack thereof.

This also happens with tools you have to use but don’t get much payoff from—like internal tooling. At work, we have a shitty in-house feature flag service. It breaks all the time and is super finicky. Learning it properly doesn’t really help me, so I mostly copy and paste my way through it.

Another example is jq. I use it occasionally, and ChatGPT handles the syntax pretty well. For me, learning it properly just isn’t worth the time or effort.

  • Makefile syntax is also well understood by ChatGPT. If you want to know a suitable way for doing some task, ChatGPT can do it. It can also explain what another Makefile is doing.

    Here's an example of a (similar) prompt I used recently: "Write me a makefile that executes a script inside a docker container. I want the script to be someprefix-<target-script> that calls /app/<target-script>.sh inside the container."

    I don't have to care about Makefile syntax anymore for the most part.

    • Exactly. I used to go for Just and Go Task as Make alternatives, but not anymore. LLMs are great with Make syntax, and Make is everywhere.

  • > Another example is jq. I use it occasionally, and ChatGPT handles the syntax pretty well. For me, learning it properly just isn’t worth the time or effort.

    This resonates with me, I was in exactly the same position when I needed to do something with `kubectl` JSON output - just ask ChatGPT because I couldn't be bothered to learn the unintuitive syntax.

    Interestingly I _can_ blame the tool, because I started using Nushell[1] which has built-in JSON manipulation that provides a MUCH simpler syntax, and I have learnt this properly because it was that easy.

    1: https://www.nushell.sh/

    • Nushell is awesome. Too bad the incompatibilities are still a bit too much for me to use it as a daily driver.

      It’s easy to blame the tool, but sometimes the problem space is inherently complex. With limited time, building the right abstraction is an immensely difficult job. LLMs fixed this issue for me, and I’ve stopped complaining about unintuitive but ubiquitous tools.

This only happens because people treat build code at a lower standard than app code. IMO you should treat all code with the same rigour. From build scripts to app code to test code.

Why write hacks in build tools when you wouldn’t do in your app code.

We build tool code with the same quality as the app code. That’s why most tooling we use are written in typescript: type safety, code reuse…

  • I would argue the main reason is that Make is just bad. There are easier to use alternatives such as scons or rake that don't have this effect applied to them.

Why do some tools have this problem, and others not?

I think it's convention over configuration. Makefile can do anything, so every project is different and needs different configurations, and everything must be configured. Which means that when I use a tool like that, it's sooo many decisions to make, that I just copy something that I know works.

If instead it was some sane defaults, it would be pretty apparent where it deviates. And instead of thinking of hundred things and which to choose, I either don't think about them, or think "do I have a reason to configure this instead of using defaults?"

Makefiles have an even more interesting issue: They lost their main purpose. In many, many projects that I've seen, they only consist of phony targets. No dependency tracking is used whatsoever.

How many Makefiles are there that just Wrap npm, pip, or some other tool like that? A Makefile is supposed to be the build system, not trigger it.

  • Okay but make is a shitty build system. What it does have going for it is you can nearly universally expect it to be already installed or easy to install. That makes it a good way to name commands shorter in a portable way, with some dependencies maybe thrown in.

    It’s used for the same reason we write shell scripts

    • > It’s used for the same reason we write shell scripts

      Only worse since it also uses $ for its variables leading to "thing:\n\t env FRED=$$HOME/thing some-command -p $$AWS_PROFILE $(OTHER_THING) -d $$(date +%s)" level of squinting

      So for those using it as a task runner from the 60s, without dependency tracking, now it's just laziness over a shell script that has "dependencies" in an imperative and shellcheck-able way

calling this "Makefile" effect is a terrible disservice. one could as easily call it "PHP" effect, "YAML" effect, etc. pick whichever language you'd personally like to denigrate.

there is nothing that makes makefiles inherently more or less susceptible to this. if it's more common, it's because people don't want to take the time doing more solid engineering and clean design for something like a ci/cd config or a makefile, being viewed as ancillary or less important. and so they often don't want to learn the language, so monkey-see-monkey-do.

as sibling comments state, this is better called cargo cult or maybe copy-pasta. and i've seen it with any language c, c++, python, scripts, config files, anything. i even see it in chat gpt answers because it's regurgitating someone else copy pasta.

if it's a problem, it's a problem of discipline.

  • The reason why it seems to apply to makefiles in particular is because most people think life is too short to bother learning and understanding makefiles so it seems to happen there more than anywhere else.

    Also no matter how complicated and subtle you think your makefile is, true experts will tell you it's wrong and you instead copy their apparently over-engineered, hard to understand makefile

    • > Also no matter how complicated and subtle you think your [thing] is, true experts will tell you it's wrong and you instead copy their apparently over-engineered, hard to understand [thing]

      not unique at all to makefiles, probably not even in the top ten [things] that "true" experts like to "help" with

  • I think article addresses

    > if it's a problem, it's a problem of discipline.

    With this

    > Also note: the title is “be aware,” not “beware.” The Makefile effect is not inherently bad!

> Does it need syntax of its own? As a corollary: can it reuse familiar syntax or idioms from other tools/CLIs?

I’m with the author here 100%. Stop inventing new syntaxes and formats for things that don’t need it. It’s not clever, it’s a PITA when it doesn’t work as expected at 3:30 on a Friday.

I see this effect in Java Maven pom.xml files. It's hard to get a straightforward answer on why each build step is needed, what each attribute means, what parts are optional or mandatory, etc. There seems to be a culture of copying these XML files and tweaking a few things without truly understanding what the whole file means. I briefly looked at Ant and Gradle, and their ecosystems don't look any better. The build configuration files seem to have too much unexplainable magic in them.

  • > I briefly looked at …Gradle… The build configuration files seem to have too much unexplainable magic in them.

    This is largely due to the use of groovy. When the Kotlin DSL is used instead, it can usually be introspected by (eg) IntelliJ. Otherwise, it’s pretty opaque.

    • Bullshit. Groovy can be introspected just as well as Kotlin. And the magic in kts files is still there:

          configure<SourceSetContainer> {
            named("main") {
              java.srcDir("src/core/java")
            }
          }
      

      Unless you know this, there's zero way you will come up with this by typing `configure` and using just auto-completion. Might as well use Groovy and a String for the name of the thing you're configuring. Good tooling would be able to auto-complete from there whether it's Groovy or Kotlin (or Java etc).

      3 replies →

  • Imo, the only solution is to avoid boilerplate generators and the parent poms projects like spring boot use for things like pom files: you can look at the boilerplate to get ideas for what might be necessary, but, if you’re starting a project, write the pom yourself. It’s a pain the first couple times, but it gets easier to know what you need.

  • Java would really benefit from a fresh take on the build story. Maven is definitely a tool that suffers from this phenomenon.

  • Honestly for Java I really like Bazel. You should give it a shot. I have a project with a self contained jvm and jars from maven central. Its more explicit than the other options but way less magical IMO.

I guess this is an effect of declarative programming and layered abstractions. The declarative syntax and abstraction are an answer to code being repetitive and long and hard to follow, but this then creates its own issues by making it harder to reason (especially for beginners or occasional users) about what is actually going on. The price for learning how to get it right just becomes much higher with every layer of abstraction inbetween, because you always have to learn what's going on underneath the "cushions" anyway.

For me typical examples are Terraform configurations with their abstracted configuration syntax, which just mimicks some other configuration (e.g. AWS) and executes it in an environment where I don't necessarily have access to. Of course I'm not going to run endless experiments by reading documentation, assembling my own config and running it in painful slow CI pipelines until it works. I'll rather copy it from another project where it works and then go back to work on things that are actually relevant and specific for the business.

I end up doing the copy paste thing quite a lot with build tools, it was very common in Ant, Maven and then in Scala build tool. When your projects all have the same fundamental top level layout and you are doing the same actions over and over you solve the problem once then you copy and paste it and remove the bits that don't apply.

These types of tools there isn't much you do differently they don't give you much in the way of abstractions its just a list of actions which are very similar between projects. Since you typically with them are working in declarations rather than the usual programming primitives it often fundamentally falls down to "does my project need this build feature or not?".

I always write my makefiles from scratch. At some point in the process, I will google “make automatic variables”, because they’re a pain to memorize.

  • Yeah, I've always been mystified by the idea that writing a new Makefile is some kind of wizardly mystery. Make has its design flaws, for sure, but how hard is it really to write this?

        CFLAGS = -std=gnu99 -Wall
    
        all: foo
        clean:
            $(RM) foo *.o
    
        foo: foo_main.o foolib.o
            $(CC) $(CFLAGS) $(LDFLAGS) $^ -o $@
    

    (Except with tabs, which HN doesn't allow.)

    I haven't tested what I just typed above, but I'm reasonably sure that if I biffed it in a way that makes it nonfunctional, it will be obvious how to correct the problem.

    I mean, not that you can't do better than that (I'm pretty sure anyone experienced can see some problems!), or that there aren't tricky and annoying tradeoffs, but it just doesn't seem like a big activation barrier the way people sometimes make it out to be?

    Maybe those people just need to spend an afternoon once in their life working through a basic make tutorial? Maybe not the first time they work on a project using make, but, maybe, after the fifth or sixth project when they realize that this somewhat primitive inference engine is going to be something they interact with daily for years? At some point you're getting into "lead a horse to water" or "teach a man to fish" territory. There's a limit to how much you can empower someone who's sabotaging themself.

    There's a slightly less minimal example in https://www.gnu.org/software/make/manual/html_node/Simple-Ma... with a full explanation. You can read it in a few minutes, but of course you have to experiment to actually learn it. The whole GNU Make 4.4.1 manual in PDF form is only 229 pages, so you can read it after dinner one night, or on your commute on the train over the course of a few days. And then you'll know the complete rules of the game.

  • it always feels like those are from an era where teletypes would print out the computer output on paper and brevity was key to save ink and paper.

Same with programming: You just copy some old code and modify it, if you have something lying around.

Same with frameworks (Angular, Spring Boot, ...). The tools even come with templates to generate new boilerplate for people who don't have existing ones somewhere.

A better name for this might be the JCL effect, as even experienced mainframe sysprogs copypasta the JCL it takes to build their COBOL programs from a known-good example and then mutatis the mutandis, rather than attempt to build a mental model of how JCL works from the impenetrable documentation and write JCL de novo.

It's no big deal to me to write a small Makefile from scratch. My editor (Emacs) even knows to always use tabs when I hit TAB in a Makefile, removing the confusion of whether I inserted tabs (correct) or spaces (horribly incorrect) on the lines with the commands to build a particular target.

> However, the occurrence of the Makefile effect in a simple application suggests that the tool is too complicated for that application.

The author's overall point is fine (specifically, that one should consider developer cut-and-paste behavior as an indicator of unnecessary complexity in a tool). However, when discussing the designer's perspective, I think the author should have taken a broader view of complexity.

Much of the complexity in Makefiles stems from their generality; essentially, the set of problems to which a Makefile can be a solution. Substantively reducing this complexity necessarily means eliminating some of those use cases. In the case of make, this is clearly possible. Make as a Unix tool has been around for a looong time, and one can look at the early versions for an idea of how simple it could be.

But the rub is, simplifying make doesn't necessarily reduce complexity. Once armed with a simpler, but more limited make, developers are now tasked not only with knowing the easier Makefile syntax, but also knowing when make isn't an appropriate solution, and when and how to use whatever tool exists to fill the gap. Compounding this is the fact documentation and shared knowledge regarding which tool is appropriate for which problem is much harder to come by than documentation for the tool itself. This can easily lead to the tool choice equivalent of developer cut-and-paste behavior: "so-and-so uses build tool X so I must use it too", "if your doing (general description of problem) the only build tool you ever need is Y", "I used Z before, so I'm just going to make it work again".

Essentially you can think of make as one "verb" in a sprawling and uncoordinated domain-specific language that targets building things. Developers need some level of proficiency across this language to succeed at their work. But trading complexity that must be mastered in one tool for complexity that must be mastered across tools can very easily increase overall complexity and promote its own kind of "Makefile Effect", just at a different level.

EDIT: Some might prefer the term "Cargo Culting" rather than "Makefile Effect" here. I suggest they are the same behavior just in different contexts.

I see this often on our codebase. It was mostly written by ex-C# developers who were new to writing Go, and there’s many ham-handed C#-isms in there. At some point, someone took a guess at how something should be, then subsequent changes were done by copy-paste. Years down the road, another copy-paste job happens, and when I point out that the patterns within are not good (like, can actually be buggy), I get a confused response, because that is what was there.

There is an implicit assumption that the code written espouses best-practices, but that is far from the truth.

  • Happens to us at my day job too. The codebase is primarily C++. My most recent horror story is that I was stepping through some code in an object that was statically initialized and a variable that was initialized as `static const double foo = nan;` had a value of 0 in it. This was very surprising to me.

    I look at how we defined nan and it turns out that nan is a singleton that was initialized in a DLL somewhere. My variable was being initialized before the singleton nan was initialized. I asked around, and someone with access to the previous version control system (we migrated to git in 2016) discovered that this was part of the original commit to that VCS back sometime in 2003-2006 or something. We think that was probably from before our C++ compiler was updated to support C++98 and `numeric_limits` was added.

    So of course I moved this over so that accessing our special nan singleton is just a static constexpr call to `std::numeric_limits<double>::quiet_NaN()`. But our special nan singleton is used all over the place in our codebase. So of course I have to check to see nobody's doing something weird with it.

    Of course they are.

    There are about a hundred lines of code that boil down to `if (foo == special_nan_singleton) { /* ...handle nan / }` which of course...isn't how nan works. This is a no-op and the compiler just straight up compiles it out of binary. This happens a lot*. Including fundamental infrastructure, like the special JSON serializer somebody reinvented.

  • I can imagine these developers being frustrated with highly verbose and often unsound semantics of Go.

  • IDK I feel like Go suffers from this a lot. I have seen a lot of Gava, Guby, and G# over the last few years. It happens in Python a lot as well. Some people just love to write Java in Python and the new type hints make it even easier.

So, it's the tool's fault that the user chose it, and it's the tool's fault the user never learned how it works?

This is like taking a hike up a rocky hill because the trailhead had a smooth path, later tripping over a rock, and then blaming the rock.

I'd redefine the Makefile (or YAML, Bash, etc) effect as:

  Tools that are easy enough that people try to use them
  without learning how they work first, and hard enough that
  people later blame the tool when they crash into their
  own ignorance.

> Complex tools are a necessity; they can’t always be avoided. However, the occurrence of the Makefile effect in a simple application suggests that the tool is too complicated for that application.

This footnote actually made me think about IDEs and the JS toolchain even more than makefiles.

If I'm writing a small project (say, 10 code files) surely an IDE where most people only know how to use 4 of the 1000 buttons is overkill, and I'd use a makefile.

Similarly surely 10 code files with 10 config dotfiles to set up a JS environment and tooling for dependencies, versioning, linting, transpiling, etc is overkill too.

- Basic javac/gcc/swiftc/whatever commands are simple, even if they can scale up through every niche via configuration options.

- Basic makefiles are simple, even if they can scale up to something like the xnu makefile tree (the most complex make system I've encountered).

- Let's not talk about JS.

I'm hesitant to use the word "lazy" to describe people who do what the author is describing - not just because I sometimes do it myself but because I believe that laziness is a derivative observation of time constraint, executive function exhaustion, and other factors. It also reminds me of the classic "I'm going to learn X, which handles/wraps Y, so that I can avoid learning Y", which is generally a bad pattern of motivation.

At its core this feels like a failure to understand (or failure of others to teach) fundamentals / first principles of the tools being used.

> Think about CI/CD setups, where users diagnose their copy-pasted CI/CD by doing print-style debugging over the network with a layer of intermediating VM orchestration. Ridiculous!

I don't think the author understands the point of "CI/CD systems". And I don't really blame them, because workload orchestration systems have been abused and marketed to the point where we call them CI/CD systems instead. Sure, if you think the point of CI/CD is to just provide a unified build and deploy it somewhere, you can write that in whatever language you like, and not need to know a bunch of YAML-fu.

But the whole point of workload orchestration systems is to configure the flow of workloads between machines as they inherently execute on different machines. The status quo is to debug over the network because, fundamentally, different machines will be connected by a network and the workload orchestration system is figuring out which machine to put it on.

If you think you can just run your CI/CD on a single server without networking or virtualization, I have some very large, parallelized testing suites to show you.

  • > If you think you can just run your CI/CD on a single server without networking or virtualization, I have some very large, parallelized testing suites to show you.

    Nowadays you can get a single server with 256 cores and several terabytes of memory. I would be interested to learn what kind of testing suites have actual needs beyond that.

    Without virtualization though is definitely no problem. The whole docker/k8s/whatever shtick is mainly because devs think it's more fun to invent new things than to learn how to use old ones properly. At least as long as you're running your own code on your own hardware, there is not a single thing solved by virtualization that wouldn't be solved equally well (or better) with traditional tools like environment modules and Slurm.

    • For a start, any suite that takes >X hours on a single node, especially compounded if you have a large team of developers.

      > At least as long as you're running your own code on your own hardware

      Assuming you keep a consistent env/OS across all nodes you will want to run said code. Which can be difficult, even just between two users on a single node.

      Not to mention the fact that a lot of (most?) code needs to (A) interoperate with other people's code and (B) at least sometimes run on other hardware.

      2 replies →

    • And you can get desktop workstations with similarly high core counts and RAM. You're missing the point. Your strategy can either depend on whether or not you can afford to buy out larger and larger vertically scaled servers, or you can plan for horizontal scaling. Almost nobody is willing to sign off on the vertical scaling strategy because the sheer presence of a ceiling frightens executives.

      And yes, in enterprise, two things are usually at play: you need to test a system where the architecture includes the combined architecture of multiple corporate acquisitions, more than one of which were Vertical Monsters, and more than one of which presumed horizontal scaling; and where deployment scripts must be run from behind a no-ingress-permitted firewall, which means having workload orchestration runners installed behind that firewall.

This is poor reasoning and poor argumentation.

This particular line is wrong in my opinion:

> the tool (or system) is too complicated (or annoying) to use from scratch

Unless you're doing trivial things, any tool or system will require some setup (which people call "ceremony").

Tools and systems can be easy to use from scratch only if they are either super-specialized or they impose significant constraints on what you will be doing with that tool and how you will be doing it.

Such tools are usually very tightly couple to a specific job/environment/task and are hard to keep around, keep updated and to evolve.

Make and similar tools instead are generic and can be adapted. The fact that you can reuse the previous work done is actually a feature. You can dive as deep as you want or need. They're very widely used so it's not an issue to keep updated. They're so widespread you can find people already familiar with those tools. Learning such tools is a great investment because you can keep using them over and over across project and companies.

Some of those tools are either timeless (gnu make) or have a very long life (more than a decade, which very long for this industry).

Anecdotal example: I learned a bit of apache ant while in high school because my laptop at the time (a netbook with 1GB ram and an atom processor) could not run NetBeans decently, so I had to learn a bit of apache ant and resort to writing and maintaining my own build.xml file. Fast forward 14 years and I see a build.xml file in the $FAANG codebase I was working on. That learning did pay off beautifully many years later.

The article is shortsighted, if anything it's promoting a shallow way of working. You are supposed to learn about the tools you use.

The tacit assumption of the OP is that it is better to do something else. That is, to start with some sort of first principles and create from scratch the artifacts that your project needs. It is telling that his example is a build artifact - these tend to change infrequently. The only way for one human mind to truly "dwell" in the space of project builds is to maintain many of them at once, as many as it takes to fill your days with nothing but build concerns.

"Tools that enable this pattern are harder to use securely..." Harder than what? A totally custom build? One made from first principles? I would, in fact, argue the exact opposite. And in fact I would argue that copy-paste-modify serves practicioners very well, especially when it comes to on-boarding. If you disagree, do the gedankenexperiment where you imagine joining a team with a totally custom build versus one with a lightly edited common make file. Which experience would you prefer, all else being equal?

Copy+tweak happens IRL all the time. There's no reason everyone who bakes should have to reinvent biscuits from scratch. There's no reason chip manufacturers should have to reinvent N-type P-type sandwiches from scratch. The existence of adaptations of previous success does not suggest that baking, or physics, or Make, is overly complicated.

Make has to be one of the more unfairly maligned languages out there. Most “replacements” purport to solve problems make doesn’t have, and are strictly worse than make at what they do.

Anyway, the GNU make manual is a good read for anyone that needs to edit a makefile or design a project build. So is “recursive make considered harmful”.

I think this is completely normal for tools that you program seldomly. I write makefiles a couple of times a year, I've been using make for more than 40 years now, I use it every day, but I seldomly program it, and when I want something more than simple dependancies I often clone something that already works.

amazon's internal build tool experiences this same phenomena. engineers are hired based on their leetcode ability; which means the average engineer has gaps in their infrastructure and config tool knowledge/skillset. until the industrys hiring practices shift, this trend will continue.

  • As an undergrad, I did group projects with people who quite literally could not compile and run any actual project on their system outside of a pre-packaged classwork assignment, who essentially could not code at all outside of data structure and algorithm problem sets, who got Google internships the next semester.

    But they were definitely brighter than I when it came to such problem sets. I suppose we need both sorts of engineer to make great things

A design philosophy called "Progressive Disclosure" tries to tackle this problem, where a tool is supposed to present a minimal set of functionality initially to allow a user to be productive without being an expert and progressively "reveal" more complex features as the user becomes more familiar with the tool and attempts to do more complex things.

I've heard the programming language Swift followed this philosophy during development, though I've never written any Swift code to know how well it worked out.

https://en.wikipedia.org/wiki/Progressive_disclosure

At first I couldn't understand what this article is saying. Then, .SECONDEXPANSION: kicked in!

Honestly, my .zshrc file started out as a .kshrc file that was passed down to me by an older developer about 20 years ago, when I was still in university. I've added and removed a lot of things over the years, but there are still a few parts of it that I don't totally understand, simply because they work and I've never had a reason to think about them. The guy I got it from, in turn, got his from someone else.

In the old days, I had a .fvwm2rc config file that I got from my boss in the university computing center. I had no idea how it worked! And neither did he -- he got it from a professor when he was in university.

This is pretty thought provoking. I think the issue is "80% of the use of this complicated tool is for very simple ends". From there you get a lot of "I can't be bothered to learn git/make/sed/helm/jenkins, all I'm doing is X 15 minutes a year". My guess is SWEs hate ceilings, so we don't want to use tools that have them, even though they'd be far more fit for purpose. We also don't want to build tools with ceilings: why limit your potential userbase/market?

This is how I feel about systemd unit files for things that I used to use crontab for. They aren't even particularly complicated. But editing cron files was self-explanatory. I leanred it once and I did not need to look it up ever. Whereas, systemd unit files, I still have to lookup every single time. There's something wrong with that. They are of course very much superior in many ways... but not all.

To me it seems fine that a tool that is both complexity and versatile needs a config file that is beyond memorization. So I think this line of reasoning has limitations.

I could see it with say CLI tools though. Like if I need to reference my notes for a CLI command then that may well indicate a failure in tool design.

>repeatedly copy a known-good solution and accrete changes over time.

Alternative phrasing would be that it evolves. Arguably there is a positive trajectory there

Good points in general.

On the other hand, there are cases where (beneficial/desired) verbosity prompts copy-paste and tweaking - not due to complexity but from some form of scale or size of the input.

In many cases this is a sign of something that should be dynamic data (put it in a db instead of conf) but that's not always the case and worth the tradeoff in the moment.

> However, at the point of design, this suggests a tool design (or tool application) that is flawed: the tool (or system) is too complicated (or annoying) to use from scratch.

As someone who teaches and sees college-level students ask chatgpt what's 1 + 1, I disagree that it has anything to do with complexity or annoyance.

Humans be humans; that's mostly it.

I do write makefiles de novo (including in corporate settings). But I start "backwards" with the "clean" and "distclean" targets, then get a single basic debug build target working. From there, I find it relatively easy to expand to larger and more complex operations. Brick by brick.

This happens to me all the time with bazel. It is too complicated and the documentation sucks so I just look for prior art and copy paste that. Sometimes I have to do something novel and it takes me several days of deep diving the bazel source code to figure out how to do something.

What is a counterexample?

I cannot think of a single tool which is complex enough but does not show the makefile effect

I dunno they're using copy-paste as if it's a bad thing.

There's no virtue in writing something character by character.

If anything these languages and systems are better for the Makefile effect.

That you can't really do effective copy-pasting in imperative languages is a weakness.

  • The author mentions that copy-pasting code by itself is not a bad thing. The problem with the phenomena they describe is that people copy-paste files around because they _don't understand it_, and end up with stuff that works, but is inefficient and hard to debug.

The traditional Unix man page or list of options output with --help is often a firehose of details that most devs will never use. Sometimes there are a few examples shown of common use cases which is a good place to start learning the tool.

  • Sure, but IME even when the tool in question is incredibly well-documented (like Django, or some other popular library), and has plenty of examples, most still don’t read the docs.

    I don’t know how to deal with that mentality. I don’t mind showing someone how I came to an answer, but I also expect them to remember that for the next time, and do some searching of their own.

There's another similar concept that sort of goes along with this.

Sometimes it's better to duplicate code rather than make a special routine to do it.

Sometimes it's not only easier to copy/paste, but it is better than adding a level of abstraction

Common denominator is that people want a complex system to instead be something that can meet their simpler needs without being a time sink.

Ironically enough, I tend to write my Makefiles from scratch nearly every time, which means I have little reason to consider this term fitting.

Alias, cmake effect.

  • Let's keep "cmake effect" for "trying your best to not have to touch the language and repeatedly looking for something else whenever you do"

    It's a very microsoft feeling pile of crap

Yeah I'm just not wasting my life (or professional time) learning Groovy, Maven, DotNET project files, DotNET 4.8, Gradle, Azure DevOps, Grafana, Prometheus, Docker, Docker compose, Kubernetes, Jenkins etc et all.

I need those things once at project setup. I copy-paste and change a bit.

Why copy-paste? It's a proven strategy with a high success rate despite little effort behind it.

I also don't want to learn templating for every little DSL I need to write one file per project with.

But if you love doing it "the right way", you're welcome to do that work for me.

  • Any long-lived project's build will have to be updated/improved at various times throughout its lifetime so there needs to be somebody around who truly understands the build.

Someone remind me, is it $@ or $< ?

  • In a given rule:

        foo.o : foo.c
                $(CC) $(CFLAGS) -o $@ $<
    

    The "$@" is the output (or target, think of @ as a bullseye on a target), and the "$<" is the input (think redirection). The only other commonly used variable is "$^":

        foo : foo.o util.o other.o
                $(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS)
    

    In this case, "$^" means "all inputs".

Most of my simple C projects have make.sh instead that has something like:

    clear
    gcc some options -o foo && ./foo

  • You might benefit from make, as you wouldn't need a full rebuild every time, or have to spell out every step.

    • For C and C++* projects under ~100k lines I wouldnt bother with incremental builds - I have a 70k C project with a single translation unit that builds in under 1s on my machine.

      * C++ requires some discipline to not explode build times, but it can be done if you dont go nuts with templates and standard headers.

It’s like a variant of Pournelles Law.

  • i think you got the wrong eponymous law, pournelle's iron law of bureaucracy (which i see happening all the time, btw.) has nothing to do with this issue.

In other news I like the coherence of your online digital identity and websites.

Reminds me of the early internet. Auras of class, cred, erudition, intelligence, mystery, imagination. Thank you.

There are many angles.

Part of the low-code/no-code story is that conventional programming requires programmers to not just decide what has to be done but in what order to put those things. (This is connected with parallelism because if tasks are done in a particular order you can't do more than one at a time.)

An Excel spreadsheet is different from a FORTRAN program, for instance, because you can write a bunch of formulas and Excel updates the results automatically without you sequencing things.

https://en.wikipedia.org/wiki/Topological_sorting

is an easy approach to finding a valid order to do tasks in. It's used frequently in build systems but rarely in other software so it contributes to build systems seeming to be from another planet

---

I work in Java a lot and I used to hate Maven, because, if you look at Maven as "an XML vocabulary" you're going to frequently find "you can't get from here" and looking for answers in Stack Overflow is going to dig you in deeper.

The right way to think about Maven is that, like (part of) Spring, it is a system for writing XML files that configure a group of Java objects. Seen that way, writing a Maven plugin should be a second resort; the moment you're scratching your head wondering if it's possible to do something, you should (1) make sure you can't "go with the flow" and follow conventions, then (2) write a Maven plugin.

The thing is, a Maven plugin is just an ordinary Maven class. You're a Java programmer, you know how to do things in Java. All the skills you use everyday apply, you're no longer in a weird, mysterious and limited environment. That's part of the "makefile program"; you probably build your code (edit files in Java, C, whatever) 1000s of times for every time you change something about your build system. On a team you can be very productive if you know the main language but have no idea about how the build works (if the build the works.)

When you try this though you often run into political problems. In most places, for instance, only a few people on the team have the authority to create new maven projects (a plug-in is a class defined it's own project.) Maybe that makes sense because you don't want them breeding like rabbits, but a lot generally most systems are badly partitioned as it is, and I think many programmers wouldn't want to have the fight it would take to create a new project.

People are accustomed to builds being SNAFU and FUBAR.

When I first saw Maven I was baffled that, as a remote working on a system that had about 20 programmers and about 20 projects I couldn't build the system successfully at all. The build worked maybe 70% of the time at the office and people didn't seem to worry about it. I could live with that because they were building large complex systems that they were always throwing away and I was building simpler spike prototypes that worked.

I worked at a number of places where builds were unreliable in ways that seemed quantitative rather than qualitative, eventually I realized the problem was really simple, if you were using snapshot builds and a lot of people were working on a project and you didn't have any transaction control you would often get a set of snapshots that were not compatible with each other.

Most teams don't take builds seriously enough. I've found it rare to meet an engineering manager who can answer the question "how long does the build take?" and fantasize of going to a job interview, asking that question, and if I don't get an answer, standing up and walking out.

For many projects I've worked on I've built "meta-build systems" where the assumption is that have 12 maven projects and 4 npm projects or something like that (aren't most of us using mixed languages in the React age? why isn't this treated as a first-class problem), and such a system can be smart about what gets built and what gets doesn't, what is running out of snapshots and what can have a fixed version, etc. Merges in changes from develop automatically, and if seven steps are required that take a total of 15 minutes I ought to be able to think about something else entirely for 15 minutes and hear a "beep" when it's done.

Trouble is we don't take builds seriously enough: it's a technical problem and it's a political problem and we often don't face the technical problems because of the political problems.

Setup.py, cron, makefile, bash scripts, GitHub actions, and devcontainers all had this effect... Until AI came around.

Now AI does a great job of getting you 90-100% of the way there.

Is it not a problem which is basically COMPLETELY SOLVED by LLMs ?

The reason this happens is because Makefiles (or CI/CD pipelines / linters config, bash scripts) are more or less "complete language" on their own, that are not worth learning when you can do ... exactly what the author says (copy/pasting/modifying until it works) 99% of the time.

But LLMs in general know the language so if you ask "write a minimal Makefile that does this" or even "please simplify the Makefile that i copy/pasted/modified", my experience is that they do that very well actually.

  • Completely solved? I'd say exacerbated beyond recognition. We have tools to let us get by so much farther without understanding anything, so it probably becomes less of a problem in more cases. But it basically guarantees that all but the most curious will not understand how the system actually works. Everything becomes magical copy/pasting from the most advanced information retrieval system with LLMs.

    • But LLMs is literally a "person in the room" that actually knows how it works.

      The simplification and explanation abilities of chatGPT are off the charts in precisely these cases. I honestly don't understand why I'm being downvoted.

      You wanted a simpler make file before ? go spend 2 days learning the make syntax

      You want a simpler make file now ? just ask for it as long as as the explanation with it of all the concepts to ChatGPT and you'll get it