Comment by stego-tech
1 day ago
> The SQLite source code is over 35% comment.
This is a gold standard for documenting source code, in my experience. It doesn’t entirely solve the “bus” problem, but reduces the barrier to entry for new maintainers or project resuscitators substantially by making it easy to understand what’s going on and why decisions were made.
Admittedly, my own code is also in that 35% range for commenting, so I’m inherently biased towards that threshold specifically. YMMV.
I don't aim for any specific value, but I find that my code is decidedly bimodal — any give file is either close to zero comments or in that 30-50% by lines range. More specifically: I try to structure my code such that the "30-50% comments" code is what enables me to write the zero-comment code.
I worked that way at first, but found that I had difficulty parsing my own zero-comment code to myself years after I wrote it. Nowadays I preface every file/snippet I write with a README block to summarize what the intent of the code was, and I redress my pseudocode into comments once the actual code is written out.
This naturally gets me close to the 1/3rd mark, I find, without really trying. When that code is shared with others, they typically report back that it’s very easy to read quickly and make adjustments.
But again, that’s my entirely subjective experience.
The percentage of comments in line counts doesn't measure the quality of the comments, nor the quality of the codebase.
It's rather easy to get over 50% by putting a comment above each line of code, containing the output of an LLM that's asked "what does this line do" and supplied with exactly that line of code. It's much harder to make sure the comments make sense and actually add value.
"The SQLite source code is over 35% comment. Not boiler-plate comments, but useful comments that explain the meaning of variables and objects and the intent of methods and procedures. The code is designed to be accessible to new programmers and maintainable over a span of decades. "
They try!
got me thinking that it would be interesting to remove all comments that cant be reproduced by llm on code base with comments stripped out.
If the llm can produce similar enough comment from scratch, would it be better to just have an IDE that dynamically injects comments when you need as opposed to them being in version control?
One of the stated goals is to have long-term support and maintainability. Adding in a dependency like an IDE is already a large step away from that goal, and to include a dependency on a LLM's non-auditable output actively steps away from that.
Comments in source code are always going to meet the maintainer's intention and will much more likely cover the use cases that comments are meant to cover - unintuitive cases or decisions, unclear algorithms, general usage to point maintainers in the right direction, and so on. More importantly, comments in the source code require no additional tools or other dependencies and as such are more dependable.
Why would I want comments produced by a roll of the dice rather the human who was in the thick of it?
I would instead be willing to consider some kind of QC assessment. Where does AI think the comment does not match the code because something has fallen out of sync.
2 replies →
Is that by line count or by character count? (So I know whether I should instruct my LLM to either write haikus or verbose/hippopotomonstrosesquipedaliophilic comments.)
AI generated comments are useless unless you fix them up, otherwise the reader can generate AI explanations themselves, you're not adding anything.