← Back to context

Comment by aleph_minus_one

3 days ago

> The core problem is that decision-makers—often far removed from actual engineering work — believe that tacit knowledge can be replaced with documentation, tools, and processes. [It] cannot.

I am not so certain:

For example, I think that a lot of my knowledge about the system that I work on could be documented, and based on this documentation someone new could take over the system.

The problem rather is: the volume of documentation that I would have to write would be insane; I'd consider ten thousands of dense DIN A4 pages to be realistic - and this is a rather small system.

So, a new person who could take over this system would have to cram and understand basically all the details of this documentation insanely well.

This insane effort (write the documentation; new workers on the project then have to cram and understand every detail of this incredibly bulky documentation) is something that no employer wants to spend money on: this is in my experience the real reason why it isn't done.

The deeper I wade through Microsoft’s Azure documentation the more I feel the reality of this. There’s so much of it that it basically is unreadable in real terms, most employees will never get the time allocated, and when you do try to exhaustively read up on a specific area you find that the documentation is incomplete and wrong in subtle but important ways. I’m sure Microsoft spends a lot of resources on that documentation, but it seems somewhat of a hopeless mission.

I think it's an important property of a system to be documentable not just documented. What I mean essentially, is the system was designed with sound principles, and said principles were written down and followed.

I have seen this work only once in my life, and it was so nice to see, but yeah, most code is just a ball of twine, and even if there was a guiding principle beneath, it has been long abandoned, and overruled, and the only way to understand the system is to take it all in at once.

  • I think it’s reasonably easy to design a system that’s documentable and documented. It’s very, very hard to maintain and iterate on a system while maintaining those properties.

    Hacky things will make their way in because it takes a month to do the documentable thing and a week to ship the hacky thing.

    It takes a lot of skilled people from varying disciplines to figure out what things are going to survive long enough and be important enough to spend the resources doing the right thing instead of the hacks.

    It bites both ways. I’ve seen core business products crippled by years of digital duct tape, but I’ve also seen internal tooling that never really becomes useful because they insist on doing the “correct” thing and it’s constantly a year behind what we need it to do.

    • Let me give you and example of what I referred to - I used to work on a big enterprise app that was a desktop app, it had a dockable very complex user interface and a quite sophisticated data management layer that you were supposed to integrate with to display and query stuff.

      These core services were extremely well designed and documented, and if you wrote a component using them, you could be reasonably certain the UI displayed correctly, behaved consistently etc.

      But imo even more importantly, due to the patterns these components enforced, if you wrote a component like this, chances were somebody could go in and read your code and understand what it did, and if you depended on external stuff, even then it was very clear on what and how.

      A lot of the code was using this correctly, but some wasn't for whatever reason. And the ones that didn't were utter hell to work with, as they could change state any time, did things their own way, depended on implementation details etc..

      Essentially had to understand the whole code base, like what services call where, in what order to things happen etc. to work with that code.

      Behavior of those components was essentially 'what the code did' and therefore undocumentable.

      For reasons, the amount of code belonging in the latter category grew and grew, and eventually it infected the core systems, as the invariants described by the documentation basically no longer be expected to hold in any scenario, and the whole thing became an undocumentable ball of twine, where the code did things not because it made sense, but to work around bits and quirks of other pieces of code.

It’s way easier (for this type of scenarios) and far more effective to learn by doing than to learn by reading (even tens of thousands of pages of) documentation, that is the crust of it.

  • > It’s way easier (for this type of scenarios) and far more effective to learn by doing than to learn by reading

    I don't think so: the problem is that there exist lots of parts in the system that are quite complicated but which one very rarely has to touch - except in the rare (but happening) case that something deep in such a part goes wrong a for requirement for this part pops up.

    If you "learned by doing" instead of reading, you are suddenly confronted with a very subtle and complicated subsystem.

    In other words: there mostly exist two kinds of tasks:

    - easy, regular adjustments

    - deep changes that require a really good understanding of the system

    • It's kind of a learning JIT. It's no use to go through and memorize something you don't need in the short term. It's hard to memorize well and by the time you need to draw on the knowledge it's already hazy. This is why you can think of such documentation more as a reference manual and not just plain documentation.

      In any case, AI is great for traversing a codebase and producing at least a draft of such documentation.

    • I tend to document some tricky non-obvious pieces of knowledge directly above the relevant code. "We have to do X below instead of obvious-first-idea-Y because Z".

      Any time a refactoring comes up which moves code around, AI (or my coworkers) remove those comments without thinking twice, and I need to tell them "hey this is still valid".

This is such a weird counter-argument, that only serves to prove OP’s point.

“It’s not that it’s not documentable. It’s just that it would take tens of thousands of pages and no one would be able to write that or read that to effectively take over the project.”

Okay, so surely this is what OP had in mind when they said documentation doesn’t work… Is it no longer safe to assume reasonable expectations when making an argument? Why the need to “well actually” them with this response?

<< [belief that] knowledge can be replaced with documentation, tools, and processes. [It] cannot. << volume of documentation that I would have to write would be insane

I am not sure those are mutually exclusive. We all know if situations where a person knows of tiny and typically undocumented system quirks. We even have a corporate name for it: institutional knowledge. The issue is that executives think it can ALL somehow be done, when even cursory real life project lift will quickly teach one how insane average gap between documented and undocumented tends to be. Add to that near constant changes to API, versions, systems, people and I can't help but wonder at executives, who really do think this way.

Documentation should serve as a general overview of the system (purpose, architecture, etc.), and elaborate on the interface of that system. Other than documenting historical relics like ADRs, I see it as a net negative in being very granular.

It quickly becomes outdated and at some point you just need to accept that only the code will be the most accurate source of truth.

But you've just perfectly described the tacit knowledge problem.

Yes, you can spend all your time writing docs, or just mentor a junior and let them grok the system through osmosis.

Also your doc won't ever have 100% coverage unless you write an absolute tome. Tacit knowledge are things that are so obvious that you wouldnt even think of writing it down in the first place.