Comment by mywittyname

5 years ago

Well named functions are only half (or maybe a quarter) of the battle. Function documentation is paramount in complex codebases, since documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.

Yeah, it's a lot of work, but working on recent projects have really taught me the value of good documentation. Naming a function send_records_to_database is fine, but it can't tell you how it determines which database to send the records to, or how it deals with failed records (if at all), or various alternative use cases for the function. All of that must come from documentation (or reading the source of that function).

Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design. When you have to say, "this function reads <some value> name from <environmental variable>" then you have to spend some time considering if future users will find that to be a sound decision.

> documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.

I'd argue that writing that much documentation about a single function suggests that the function is a problem and the "send_records_to_database" example is a bad name. It's almost inevitable that the function doing so much and having so much behavior that needs documentation will, at some point, be changed and make the documentation subtly wrong, or at least incomplete.

  • What's the alternative? Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.

    You can argue that there should be separate functions for `send_to_database` and `lock_database` and `format_data_for_database` and `handle_db_error`. But you're still going to have to document the same stuff. You're still going to have to remind people to lock the database in some situations. You're still going to have to worry about people forgetting to call one of those functions.

    And eventually you're going to expose a single endpoint/interface that handles an entire database transaction including stuff like data sanitation and error handling, and then you're going to need to document that endpoint/interface in the same way that you would have needed to document the original function.

    • > Small functions get used in other functions. Eventually you end up with a function everyone's calling that's doing the same logic, just itself calling into smaller functions to do it.

      Invert the dependencies. After many years of programming I started deliberately asking myself "hmm, what if, instead of A calling B, B were to call A?" and now it's become part of my regular design and refactoring thinking. See also Resource Acquisition Is Initialization.

      4 replies →

Yikes, I hope I don't have to read documentation to understand how the code deals with failed records or other use cases. Good code would have the use cases separated from the send_records_to_database so it would be obvious what the records were and how failure conditions are handled.

  • How else are you going to understand how a library works besides RTFM or RTFC? I guess the third option is copy pasta from stack overflow and hope your use case doesn't require any significant deviation?

    You seriously never have to read documentation?

    Must be nice, I've been balls-deep in GCP libraries and even simple things like pulling from a PubSub topic have footguns and undocumented features in certain library calls. Like subscriber.subscribe returns a future that triggers a callback function for each polled message, while subscriber.pull returns an array of messages.

    That's a pretty damn obvious case where functions should have been named "obviously" (pull_async, pull_sync), yet they weren't. And that's from a very widely used service from one of the biggest tech companies out there, written by a person that presumably passed one of the hardest interviews in the industry and gets paid in the top like 1% of developer.

    Without documentation, I would have never figured those out.

"Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design."

This, this, and... this.

Sometimes, I step back after writing documentation and realise, this is a bunch of baloney. It could be much simpler, or this is a terrible decision! My point: Writing documentation is about expressing the function a second time -- the first time was code, the second time was natural language. Yeah, it's not a perfect 1:1 (see: the law in any developed country!), but it is a good heuristic.

Documentation is only useful it is up to date and correct. I ignore documentation because I've never found the above are true.

There are contract/proof systems that seem like they might work help. At least the tool ensures it is correct. However I'm not sure if such systems are readable. (I've never used one in the real world)

  • Oh I agree, but a person who won't take the time to update documentation after a significant change, certainly isn't going to refactor the code such that the method name matches the updated functionality. Assuming they can even update the name if they wanted to.

    After all, documentation is cheap. If you're going to write a commit message, why not also update the function docs with pretty much the same thing? "Filename parameter will now use S3 if an appropriate URI is passed (i., filename='s3://bucket/object/path.txt'). Note: doesn't work with path-style URLs."