Comment by ericbarrett

5 years ago

Another thing Martin advocates for is not putting your name in comments, e.g. "Fixed a bug here; there could still be problematic interactions with subsystem foo -- ericb". He says, "Source control systems are very good at remembering who added what, when." (p. 68, 2009 edition)

Rubbish! Multiple times I've had to track down the original author of code that was auto-refactored, reformatted, changed locations, changed source control, etc. "git blame" and such are useless in these cases; it ends up being a game of Whodunit that involves hours of search, Slack pings, and what not. Just put your name in the comment, if it's documenting something substantial and is the result of your own research and struggle. And if you're in such a position, allow and encourage your colleagues to do this too.

Better put such a long explanation there that your name isn't needed any more. Because if it is your name that makes the difference chances are that you have left the company by the time someone comes across that comment and needs access to your brain.

  • Sometimes what is interesting is that you have found that another engineer—whom you might not know in a large enough organization—has put time and thought into the code you are working on, and you can be a lot more efficient and less likely to break things if you can talk to that engineer first. It's not always the comment itself.

    • Sure, but in IT the rule of assumption should be that the code will outlive the coder. If being able to talk to other engineers is going to make the difference (instead of just being an optimization) then you already have problems.

      3 replies →

    • Yes, that is exactly the thesis of the classic paper "Programming as Theory Building"[0], that it is the theory in the human's head that we pay for and that the complete history of the code is often not enough to effectively modify a program.

      [0]https://pages.cs.wisc.edu/~remzi/Naur.pdf

  • Or have gotten busy since. I worked at Coinbase in 2019 and saw a comment at the top of a file saying that something should probably be changed. I git-blamed and saw it was written six years earlier by Brian Armstrong.

I think most of what martin says is rubbish, but this is not. I have never had `git blame` fail...ever. I know what user is responsible for every line of code. Doing this is contemporaneous. Its right up there with commenting out blocks of code so you don't lose them.

  • I don't know what to say, this is a real problem I have encountered in actual production code multiple times. Any code that lives longer than your company's source control of choice, style of choice, or code structure of choice is vulnerable. Moreover, what's the harm? It's more information, not less.

    • >Moreover, what's the harm? It's more information, not less

      If your code is outliving your source control system you got bigger problems than whatever is in your comments. I can confidently say that in 25 years in this industry this isn't something I've encountered. So probably sufficiently low enough probability to safely ignore.

      >Moreover, what's the harm? It's more information, not less.

      Too much information is every bit as bad as too little.

      9 replies →

  • Code never gets moved or refactored by someone other than the original author?

    • So then you checkout at the refactor commit and look through the blame to continue searching. If you have to repeat this more than a few times then the person has probably left the company or hasn't touched the code in years so its better to understand it yourself before modifying.

      2 replies →

    • If it gets moved then the blame will tell me who moved it, It will also tell me what the hash was before it was moved. That hash will have all the original information. Same for the refactor case.

  • The parent's comment holds when reformatting, especially in languages with suspect formatting practices like golang, where visibility rules are dictated by the case of the first letter (wat?) or how it attempts to align groups of constants or struct fields depending on the length of the longest field name. Ends up in completely unnecessary changes that divert away from the main diff.

It does require some rigor in your version control practices, but I'm happily "git blame"-ing twenty year old code every now and then.

What is even the downside of adding a few extra characters to the end of a comment to show who wrote it?

And has Martin ever even worked on large, non-greenfield projects? That's the only way I could see anyone professing such idealism.

  • > What is even the downside of adding a few extra characters to the end of a comment to show who wrote it?

    You're right - in fact we should do this for every line of code, so that we know of whom to ask questions!

        func main() { // jen20
            fmt.Println("Hello World") // ziml77
        } // jen20
    

    What's the downside of adding a few extra characters!?

    Of course, this view is already available to people: `git blame` - and it's the same for comments, so there is no need.

    The exception is "notes to future self" during the development of a feature (to be removed before review), in which case the most useful place for them to appear is at the _start_ of the comment with a marker:

        // TODO(jen20): also implement function X for type Y
    

    Then they are easy to find...

    • What you've shown is code clutter and reductio ad absurdum to what I wrote in the top-level comment. I am speaking of architectural comments, bug-fixes, and especially in service architecture where unusual and catastrophic interactions might happen (or have happened) with code that's not under your control.

    • >You're right - in fact we should do this for every line of code, so that we know of whom to ask questions!

      if you want to live in this kind of absolutism, I would rather have the name on every line than no comments at all.

      1 reply →

I think your comment is controversial, for a number of reasons. One, I think nobody should own code. Code should be obvious, tested, documented and reviewed (bringing the number of people involved to at least two), the story behind it should be either in the git comments or referenced to e.g. a task management system. Code ownership just creates islands.

I mean by all means assign a "domain expert" to a PART of your code, but no individual segment of code should belong to anyone.

Second: There's something to be said about avoiding churn. Everybody loves refactoring and rewriting code, present company included, but it muddles the version control waters. I've seen a few github projects where the guidelines stated not to create PRs for minor refactorings, because they create churn and version control noise.

Anyway, that's all "ideal world" thinking, I know in practice it doesn't work like that.

  • Maybe not exclusive ownership, but there are always going to be those more familiar with a section of code that others.

    It's not really efficient to insist everyone know the codebase equally, especially with larger codebases.

I never found people adding a name useful.

Either the code is recent (in which case 'git blame' works better since someone changing a few characters may or may not decide to add their name to the file) or it's old and the author has either left the company or has forgotten practically everything about the code.

If you have to track down the author, then it is already bad. The code should not hope that the author never finds a new job.

  • But sometimes it is bad, and not fixable within the author's control. I occasionally leave author notes, as a shortcut. If I'm no longer here, yeah you gotta figure it all out the hard way. But if I am, I can probably save you a week, maybe a month. And obviously if its something you can succintly describe, you'd just leave a comment. This is the domain of "Based on being here a few years on a few teams, and three services between this one, a few migrations etc etc". Some business problems have a lot of baggage that aren't easily documented or described, its the hard thing about professional development especially in a changing business. There's also cases where I _didnt'_ author the code, but did purposefully not change something that looks like it should be changed. In those cases, without my name comment, git blame wouldn't point you to me. YMMV.

  • A 1000 times this. We never use git blame - who cares? The code should be self-explanatory, and if it's not, the author doesn't remember why they did it 5 years down the line either.

    • I rarely want the author name. What I want is the commit message explaining the changes, and I want to see what the code used to do.

      Code without history is nearly unsupportable without reverse engineering it all.