Comment by skrebbel
2 days ago
Cool story! If I may rant off topic a bit though, it boggles my mind that people put stuff like this:
> [This patch] fills holes in executable sections with 0xd4 (ARM) or 0xef (MIPS). These trap instructions were suggested by Theo de Raadt.
into commit messages, but not in the code. What's the cost? What's the downside to having a 2 to 3 line comment above a constant in a C file? Why pretend like all this is super obvious when clearly it isn't?
There seems to be some unwritten cultural rule, particularly in FOSS land, that you're supposed to write code as if you're an all-knowing oracle, as if everything is obvious, and so comments are for losers (as are descriptive variable names). I simply can't understand how and why that developed.
I really like putting context into commits, not into code comments. The reasoning is pretty simple: Comments aren't checked. I might write "This is done this way because John Doe suggested it, it's much more efficient this way", and then someone else changes the code to be buggy, wrong, and slow. Now, the comment is explaining behavior that is no longer there, and wrongly suggests that the code does/means something it doesn't.
Another argument is comments-as-noise, as I would call it. The more "unnecessary" comments you write, the more core developers (who write and read most of the code), will learn to ignore comments. Then, critical comments like "Be careful not to call this when XYZ isn't initialized yet, unless you don't mind ABC happening" are ignored, and ta-da! comments are now useless.
Commit messages are attached to specific changes. If I want to know why a line of code is the way it is, I can git blame it, and see which commit is to blame, together with issue numbers, authors, maybe reviewers, context, history, etc.
Should there be a comment briefly explaining this patch? Probably. But the commit message should add the other context.
Maybe there should be a convention around comments which describe functionality, those which describe history, and I’m sure we can find other types of comments. Then we can have our editors hide certain types of comments based on what we want to see.
The issue is really that the comments' semantics aren't validated. That might be an AI startup idea; going through a codebase and linting comments.
Git blame won't show you the history you care about if the line is changed in the future.
It does, you can use various parameters to go through the history of a line of code. And, usually, you only care about why it is the way it is.
I think this is saying there is a habit of updating code without reading and updating the comments associated with the code. I would argue the fix is to have people get in the habit of maintaining comments, as opposed to not writing any comments at all.
If people already have the habit of ignoring comments that are right there in the code, I am not sure they would spend the extra effort to go after commit history. Also, some commits might have originated from private repositories where commit history is not accessible, and the most context we get out of "git blame" might be "code was imported on this date".
It’d be nice if comments were always updated, but the reality of it is that they often aren’t.
Sometimes it’s because the later developer doesn’t think the comment needs to be fixed - maybe they tried to fix a bug in John Doe’s approach, accidentally introduced a new bug, but thought they didn’t touch the clever algorithm.
Sometimes it’s because the comment isn’t proximate to the code it refers to. For example, in the “XYZ initializer” case, maybe XYZ is changed down the line to remove the ABC behaviour, but the comment stays because it is attached to some faraway usage of XYZ.
Notes in commit messages don’t fix either of these problems, obviously. But, on the other hand, they obviously refer to a specific point in time, unlike comments, which makes it easier to figure out if the notes are still relevant or not.
You don't have any control over the people who touch the code after you so you cannot "fix" the risk that someone updates your code without the comment. You do have control over your own commit though.
> What's the downside to having a 2 to 3 line comment above a constant in a C file?
That the code gets changed in the future such that the comment is a lie[1]. This happens with shocking regularity. Comments sound like a great idea until you deal with them in a long term maintenance situation.
As a corrolary, they also increase the cost of maintenance because if you end up doing refactoring that makes the comments a lie, it's never acceptable in review to just remove them. People expect you to write the same kind of treatise that the original author did. And original authors are horrifyingly verbose. All those doxygen crumbs you're leaving only act to confuse and irritate the pour souls coming after you.
Code is code. It should explain itself. If it does not, comments should do the absolute minimum needed (c.f. citing the relevant section in the ISA reference by number in this case) to rectify that.
[1] c.f. Guy Steele: https://softwarequotes.com/quote/don-t-get-suckered-in-by-th...
A lot of developers think code should be self-documenting, which I fully agree with. Unfortunately though I don't think I've ever worked on a project that was actually self-documented, even though that is what the leads wanted.
It's also not always feasible. How would you write self documenting code that explained what the commit message did?
But this works as intended? The code isn't cluttered with documentation, that doesn't necessarily makes sense when reading the code, but by reading the commit, one can understand why the code was written like that.
Cluttered? A sentence describing a magic value is not clutter.
I'm not necessarily disagreeing with you (because apparently this is missing), but a descriptive constant/variable name would be even less clutter than even a 1-line comment
1 reply →
Huh? Quoting a bit more from the article:
> [W]e find this in ARM.cpp:
> trapInstr = {0xd4, 0xd4, 0xd4, 0xd4};
The only thing left to explain is that the trap instruction is used as padding, but you can’t tell from here if that’s obvious or not. Opening the actual code[1], we see that the occurrences of trapInstr are all along the lines of
> void ARM::writePlt( /* ... */ ) {
> /* ... */
> memcpy(buf + 12, trapInstr.data(), 4); // Pad to 16-byte boundary
which isn’t the absolute best, but seems clear enough (if of course you know what a PLT is, which you should if you’re writing a linker).
I do think this merits an explanation that we’re using (what’s intended to be) a trap because the traditional option of using a nop makes ASLR less effective. But then the commit message you’re quoting doesn’t mention that either.
[1] https://github.com/llvm/llvm-project/blob/b20c291baec94ba370...
I think it's a human thing. The Torah is succinct; The Talmud has a lot to say about it. For a large codebase, the comments would be huge, and also I think distracting.
In fact, as a former code auditor I can say that comments at times make bug finding harder -- they frame you up a certain way. I definitely preferred to audit without comments.
Anyway, there are definitely valid reasons. I think the commit log or dev notes files are generally preferable, especially when combined with good naming.