Comment by ofrzeta

1 day ago

Quite the praise by Grady Booch:

"There are only a few comments in the version 1.0 source code, most of which are associated with assembly language snippets. That said, the lack of comments is simply not an issue. This code is so literate, so easy to read, that comments might even have gotten in the way."

"This is the kind of code I aspire to write.”

> the lack of comments is simply not an issue

I'm looking at the code and just cannot agree. If I look at a command like "TRotateFloatCommand.DoIt" in URotate.p, it's 200 lines long without a single comment. I look at a section like this and there's nothing literate about it. I have no idea what it's doing or why at a glance:

  pt.h := BSR (r.left + ORD4 (r.right), 1);
  pt.v := BSR (r.top + ORD4 (r.bottom), 1);
  
  pt.h := pt.h - BSR (width, 1);
  pt.v := pt.v - BSR (height, 1);
  
  pt.h := Max (0, Min (pt.h, fDoc.fCols - width));
  pt.v := Max (0, Min (pt.v, fDoc.fRows - height));
  
  IF width > fDoc.fCols THEN
    pt.h := pt.h - BSR (width - fDoc.fCols - 1, 1);
  
  IF height > fDoc.fRows THEN
    pt.v := pt.v - BSR (height - fDoc.fRows - 1, 1);
  

Just breaking up the function with comments delineating its four main sections and what they do would be a start. As would simple things like commenting e.g. what purpose 'pt' serves -- the code block above is where it is first defined, but you can't guess what its purpose is until later when it's used to define something else.

Good code does not make comments unnecessary or redundant or harmful. This is a myth that needs to die. Comments help you understand code much faster, understand the purpose of variables before they get used, understand the purpose of functions and parameters before reading the code that defines them, etc. They vastly aid in comprehension. And those are just "what" comments I'm talking about -- the additional necessity of "why" comments (why the code uses x approach instead of seemingly more obvious approach y or z, which were tried and failed) is a whole other subject.

  • That particular code is idiomatic to anyone who worked with 2D bitmap graphics in that era.

    pt == point, r == rect, h, v == horizontal, vertical, BSR(...,1) is a fast integer divide by 2, ORD4 promotes an expression to an unsigned 4 byte integer

    The algorithms are extremely common for 2D graphics programming. The first is to find the center of a 2D rectangle, the second offsets a point by half the size, the third clips a point to be in the range of a rectangle, and so on.

    Converting the idiomatic math into non-idiomatic words would not be an improvement in clarity in this case.

    (Mac Pascal didn't have macros or inline expressions, so inline expressions like this were the way to go for performance.)

    It's like using i,j,k for loop indexes, or x,y,z for graphics axis.

    • > Converting the idiomatic math into non-idiomatic words would not be an improvement in clarity in this case.

      You seem to be missing my point. It's not about improving "clarity" about the math each line is doing -- that's precisely the kind of misconception so many people have about comments.

      It's about, how long does it take me to understand the purpose of a block of code? If there was a simple comment at the top that said [1]:

        # Calculate top-left point of the bounding box
      

      then it would actually be helpful. You'd understand the purpose, and understand it immediately. You wouldn't have to decode the code -- you'd just read the brief remark and move on. That's what literate programming is about, in spirit -- writing code to be easily read at levels of the hierarchy. And very specifically not having to read every single line to figure out what it's doing.

      The original assertion that "This code is so literate, so easy to read" is demonstrably false. Naming something "pt" is the antithesis of literature programming. And if you insist on no comments, you'd at least need to name is something like "bbox_top_left". A generic variable name like "pt", that isn't even introduced in the context of a loop or anything, is a cardinal sin here.

      [1] https://news.ycombinator.com/item?id=46366341

      6 replies →

    • Xyz makes sense because that is what those axes are literally labeled, but ijk I will rail against until I die.

      There's no context in those names to help you understand them, you have to look at the code surrounding it. And even the most well-intentioned, small loops with obvious context right next to it can over time grow and add additional index counters until your obvious little index counter is utterly opaque without reading a dozen extra lines to understand it.

      (And i and j? Which look so similar at a glance? Never. Never!)

      12 replies →

  • As other comments have mentioned, context does matter, and as someone with a lot of 2D image/pixel processing experience, other than the 'BSR' and 'ORD4' items - which are clearly common in the codebase and in that era of computing, all that code makes perfect sense.

    Also, breaking things down to more atomic functions wasn't the best idea for performance-sensitive things in those days, as compilers were not as good about knowing when to inline and not: compiler capabilities are a lot better today than they were 35 years ago...

  • This actually looks surprisingly straightforward for what the function is doing - certainly if you have domain context of image editing or document placement. You'll find it in a lot of UI code - this one uses bit shifts for efficiency but what it's doing is pretty straightforward.

    For clarity and to demonstrate, this is basically what this function is doing, but in css:

    .container {

      position: relative;
    

    }

    .obj {

      position: absolute;
    
      left: 50%;
    
      top: 50%;
    
      transform: translate(-50%, -50%);
    
    
    }

  • BSR = bitwise right-shift

    ORD4 = cast as 32bit integer.

    BSR(x,1) simply meant x divided by 2. This is very comment coding idom back in those days when Compiler don't do any optimization and bitwise-shift is much faster than division.

    The snippet in C would be:

        pt.h = (r.left + (int32_t)r.right) / 2;
        pt.v = (r.top + (int32_t)r.bottom) / 2;
    
        pt.h -= (width / 2);
        pt.v -= (height / 2);
      
        pt.h = max(0, min(pt.h, fDoc.fCols - width));
        pt.v = max(0, min(pt.v, fDoc.fRows - height));
      
        if (width > fDoc.fCols) {
          pt.h -= (width - fDoc.fCols - 1) / 2;
        }
      
        if (height > fDoc.fRows) {
          pt.v -= (height - fDoc.fRows - 1) / 2;
        }

  • Are you familiar with the domain?

    Because it's quite clear, everything is well named, and the filename also gives the context.

  • Finds the center of a rectangle r Positions a width × height region centered on that rectangle.

    Clamps the result so it doesn’t go outside the document.

    If the region is bigger than the document, it re-centers instead of snapping to (0,0).

  • The code's functionality is immediately obvious to me as someone who works a lot with graphics coordinate systems.

    I'm sure the code would be immediately obvious to anyone who would be working on it at the time.

    Comments aren't unnecessary, they can be very helpful, but they also come with a high maintenance cost that should be considered when using them. They are a long-term maintenance liability because by design the compiler ignores them so its very easy to change/refactor code and miss changing a comment and then having the comment be misleading or just plain wrong.

    These days one could make some sort of case (though I wouldn't entirely buy it, yet) that an LLM-based linter could be used to make sure comments do not get disconnected from the code they are documenting, but in 1990? not so much.

    Would I have used longer variable names for slightly more clarity? Today, sure. In 1990, probably not. Temporal context is important and compilers/editors/etc have come a long way since then.

  • It’s not a myth, it’s a sound software engineering principle.

    Every comment is a line of code, and every line of code is a liability, and, worse, comments are a liability waiting to rot, to be missed in a refactor, and waiting to become a source of confusion. It’s an excuse to name things poorly, because “good comment.” The purpose of variables should be in their name, including units if it’s a measurement. Parameters and return values should only be documented when not obvious from the name or type—for example, if you’re returning something like a generic Pair, especially if left and right have the same type. We’d been living with decades of autocomplete, you don’t need to make variables be short to type.

    The problem with AI-generated code is that the myth that good code is thoroughly commented code is so pervasive, that the default output mode for generated code is to comment every darn line it generates. After all, in software education, they don’t deduct points for needless comments, and students think their code is now better w/ the comments, because they almost never teach writing good code. Usually you get kudos for extensive comments. And then you throw away your work. Computer science field is littered with math-formula-influenced space-saving one or two letter identifiers, barely with any recognizable semantic meaning.

    • No amount of good names will tell you why something was done a certain way, or just as importantly why it wasn't done a certain way.

      A name and signature is often not sufficient to describe what a function does, including any assumptions it makes about the inputs or guarantees it makes about the outputs.

      That isn't to say that it isn't necessary to have good names, but that isn't enough. You need good comments too.

      And if you say that all of that information should be in your names, you end up with very unwieldy names, that will bitrot even worse than comments, because instead of updating a single comment, you now have to update every usage of the variable or function.

    • >> Every comment is a line of code, and every line of code is a liability, and, worse, comments are a liability waiting to rot,

      This is exactly my view. Comments, while can be helpful, can also interrupt the reading of the code. Also are not verified by the compiler; curious, in the era when everyone goes crazy for rust safety, there is nothing unsafer as comments, because are completely ignored.

      I do bot oppose to comments. But they should be used only when needed.

    • No. What you are describing is exactly the myth that needs to die.

      > comments are a liability waiting to rot, to be missed in a refactor, and waiting to become a source of confusion

      This gets endlessly repeated, but it's just defending laziness. It's your job to update comments as you update code. Indeed, they're the first thing you should update. If you're letting comments "rot", then you're a bad programmer. Full stop. I hate to be harsh, but that's the reality. People who defend no comments are just saying, "I can't be bothered to make this code easier for others to understand and use". It's egotistical and selfish. The solution for confusing comments isn't no comments -- it's good comments. Do your job. Write code that others can read and maintain. And when you update code, start with the comments. It's just professionalism, pure and simple.

      3 replies →

  • Man I just don’t know who to believe, you or the Chief Scientist for Software Engineering at IBM research Almaden.