Internal RFCs saved us months of wasted work

6 days ago (highimpactengineering.substack.com)

Not a panacea for broken culture.

I've just left a company that used to have an internal RFC process and it was a very significant barrier to progress that stifled innovation, led to breakdown of working relationships and caused the most productive engineers to run for the exits.

RFC is a request for comments, and it turns out you have to be really careful about the kinds of comments you solicit and who from. As soon as you ask people to comment you are setting an expectation that you will take their feedback onboard and address their points, but there’s a real asymmetry here - it is much easier to leave a critical comment, or to ask a question, than it is to address the concerns or answer the question.

This asymmetrical nature puts much more work on the shoulders of the RFC author. Similarly, the people writing the RFC have typically thought much harder and longer and deeper about the problem than the people giving feedback on it. This leads to authors having to re-explain their thinking in detail, covering points that they’d omitted for brevity or because they are obvious to those with a good understanding of the problem.

Suddenly, the task changes from shipping the feature or making the change to achieving consensus on the RFC. I have seen that process take months - far longer, and being far more expensive than just doing the work would be.

The worst part is - no one is in the wrong! The authors write the RFC in good faith, the commenters review the RFC in good faith, and they rightly expect that any problems they identify are addressed. But the whole process can be soul crushingly slow, painful and full of professional conflict.

I’m not saying that RFCs are bad overall, but you have to have a culture of accountability, pragmatism and getting things done to actually make this process work.

  • Folks with big titles will always write comments that sound smart and thoughtful but in reality hinder the process. For example:

    - This architecture binds us to AWS. Have we estimated the engineering effort to remain cloud-agnostic in case we need to move to Azure next year?

    - I see we're using Postgres. Have we considered how we’ll handle horizontal sharding if our user base grows by 1000x in Q4?

    - This synchronous API call introduces tight coupling. Shouldn't this be an event-driven architecture to handle back-pressure?

    All sound like things that are easy to ask, sound prudent to management, but are impossibly expensive to answer or implement for a feature that just needs to ship.

    • They only hinder the process if you treat them as demands instead of questions or comments. The questions are generally smart and thoughtful, just often mistimed/misplaced for where most companies decide to have the RFC process (i.e. often after completion rather than before starting main implementation).

      It's alright to answer "No, we haven't done estimations for cloud-agnostic architecture as part of this project since it was not part of the approved goals and requirements. If we decide to go multi-cloud at some point in the future, this architecture will need to be reviewed with the rest of the infrastructure as part of the migration plan".

      If that kind of answer creates a problem then the issue wasn't really to do with the RFC or the comments, that's just where the other issues in the process became apparent. Namely, the requirements are being set without all relevant stakeholders involved or even aware (which is not the same thing as agreeing).

      5 replies →

    • The folks with big titles need to determine if the company's technical strategy is cloud-agnostic and whether 1000x growth in Q4 is a legitimate concern.

      If Big Title wants to own the schedule impact they can make these demands.

      Maybe I'm not read into the secret deal with Microsoft for next quarter that'll require all 3 of these.

    • I would open a new bug for each of those questions and say “we will evaluate this after the MVP is implemented”. Give the person credit in the bug description. That will usually satisfy their concerns. Set the priority on the bugs to low and I’ll never even have to look at them again, unless one of them actually becomes a problem.

    • They're looking for the level of answer you might get instantly from an llm. I think figuring out the right balance of "ask chatgpt", critical thought, and ownership in these situations is going to be tricky.

    • In part because of the drive-by questions, they also often force waterfall-like design: few like the answer to their question being "idk, we'll find out when we get there".

      So rather than a lightweight doc that looks for obvious gaps, it becomes a giant playbook for every eventuality. While the company claims they're being "agile".

  • You don't really want company wide consensus on every point in the RFC. That should be reserved between the author(s) and their immediate teams. Setting/allowing the expectation that any comment made will be reviewed and responded to until everyone is in agreement about it is going to make things miserable for everyone and improve nothing.

    I.e. that culture of pragmatism needs to be defined as part of the expectations process, not an unspoken promise between coworkers. No process is perfect but a footgun process is easy to make anyways.

  • In my organization we have RFCs, PRDs, ADRs etc. and I would say that the process is fairly broken. That said, I think what you mention is an important but not the only failure mode of a proposals process.

    In some cases I have seen, people use RFCs to steamroll decisions when they are the only stakeholders. Here the waste comes from the fact that the proposal becomes just a bureaucratic step or a way to diffuse responsibility.

    In the case you mention (which I have seen many times) I would say the general issue is that the goals and the constraints are not qualified sufficiently. If that's the case, then there are only 2 cases: there is an objective way to measure if an objection or comment makes sense or not, or it is subjective. If it's objective, then consensus is easy to reach, if it's subjective, it needs to be acknowledged and usually the decisions falls on those who are responsible for the outcome (e.g., the team who needs to maintain or build the thing).

    Of course, the debate can move to constraints, goals or even ways to measure, but these are generally more straightforward.

    • Have you worked at an organization without RFCs, ADRs, etc? The alternative is really just the wild west and whatever politics or pull a person has. RFCs and ADRs are good in the sense that they document _something_ even if the document is junk it's better than an assumption.

      Really though it's the organization (and people) that makes or breaks anything.

  • > This leads to authors having to re-explain their thinking in detail

    What I like about RFCs (or similar documents) is the ways they work to prevent this. Recently I was involved in planning an initiative without a document like this, and we had to keep explaining and re-explaining the motivation for our decisions to stakeholders and higher-ups. With a document (assuming everyone reads the document before giving feedback), most questions get pre-empted; the ones that don't only need to be addressed once, because the answers end up in the version of the doc which you show to the next person.

    Certainly I think it's worth being selective about who you're soliciting comments from, to avoid a too-many-cooks situation, but rare is the project that doesn't need anyone's approval or feedback. Presenting a big fat document gives a sense for the amount of thought that has gone into the design, which quells the kind of off-the-cuff "why not X?" comment you might get in response to a boxes-and-arrows chart and a high-level summary.

  • There is no perfect process. I think after a long time working in software I now understand that it works like this: you need one person to work by themselves really really quickly to ship and create the foundations. After that, they need to remain in place to make sure the design continues to make sense as more people add code to the monolith. Having a specification process is a must in complex environments, and the downside is that it adds friction (and some people can abuse the back and forth to prevent you from landing your design changes), so you need that process to happen as late as possible. But if you're running in production in sensible environments then you will need to make that process happen earlier than desired.

    • > you need one person to work by themselves really really quickly to ship and create the foundations

      This is literally the secret of every successful project I've worked on. An individual, with high agency and low communication overhead has either been trusted to take the task on themselves, or they've done it secretly to side-step the approval process and organisational overhead. When the foundations are in place it's relatively easy to add engineers, but starting something from scratch with a bunch of people? doomed before it starts in most cases imo

  • My take is that an RFC should be very early in the engineering process, like as part of a proof of concept phase, and should not block progress towards completing a design proposal. The design proposal should list any legitimate alternatives to overall or component designs discussed during RFC along with the reasoning for not using them in a "designs not chosen" appendix. This at least gives your engineering leadership an opportunity to evaluate the general design ideas before anyone is prepared to die on the hill of those ideas.

    Architecture / Design review happens post proof of concept but still before any significant development work and major action items are blockers to beginning development. Further discussion about designs not chosen can happen here, especially when a flaw is uncovered that would be addressed by one of those not chosen.

  • Like many things in dev it sounds sooo good on the surface, but is a minefield in practice (Brandolini's law + The Iron Law of Bureaucracy for starters).

    I'd only advocate it in a very carefully curated team.

  • > This leads to authors having to re-explain their thinking in detail, covering points that they’d omitted for brevity or because they are obvious to those with a good understanding of the problem.

    IMHO there should be no gaps like that. If you already thought hard and long about these problems, why not just write all your knowledge down, so other people can benefit from that? Also, this makes your work a lot more accessible to those who came after you. Of course, you have to assume a certain baseline of knowledge. But if you already have a good understanding of the problem, why not formulate that as part of the RFC?

    • It is impossible for people to write all their knowledge down, there will always be gaps, there will always be things taken for granted.

      The point is that the author thought that they had captured enough information in their original draft - they can expand on that draft of course, but that's the issue - the task of writing the RFC can become much larger than just trying something out and getting real data.

  • > This leads to authors having to re-explain their thinking in detail, covering points that they’d omitted for brevity or because they are obvious to those with a good understanding of the problem.

    There's nothing wrong with this. Being able to explain your thinking in detail to someone that doesn't necessarily understand the problem is a pretty good exercise to make sure you yourself fully understand the problem _and your thinking._ Of course, this can't turn in to a lecture on basic things people should know or have looked up before commenting.

    • Sure, now imagine answering 10 different people to all of their questions? It's the largest hindrance I have ever seen but I agree with the above comment that it largely depends on the team.

  • Yeah I think it all boils down to culture. Tools like RFC (and anything else) can help propel a good culture forward. But you can't fix a broken culture with a tool.

  • I find it funny that what you said summarize the publishing process in academia very well. Only except that it's much, much worse.

  • This is my experience as well, the RFCs turn out are being made for internal processes and policy making. And also I have the feeling that people are making it like Request for Approval for technical changes, so we have to take significantly more time on the RFCs than the real work.

  • I feel this tension, but I think something has gone awry if people feel like every comment has to be addressed to everyone’s satisfaction. Comments are not commitments, and commenters don’t have an equal stake in the decision—they certainly don’t own the decision, and it’s okay to disagree and commit.

    It’s also probably worth being explicit that there is a cost to inaction that can exceed the costs of building the wrong thing. A year or two ago I was lead on a project where we didn’t know the answer to a big ambiguous problem—we just didn’t have a good way to get the information necessary to make the right decision—different people had different ideas about what we should do. So we identified the smallest thing we could build that would be useful such that we could get more information from real world use, knowing full well we might have to rebuild something else from the ground up. And we did! But we were able to get the confidence we needed to build the right thing later! And we got there much faster than if we had tried to deliberate and speculate about what that thing would be.

    Also, in that saga, one engineer in particular was really adamant that we address his particular set of concerns. He was unable to disagree and commit—he was kind of religious about all concerns being addressed before moving forward with anything. He had a very difficult time understanding that the RFC process existed to meet business goals—the business did not exist to slot into his ideal RFC process. He is no longer with the company.

Love the title. Reminds me of one of my favorite quotes: "The single biggest problem in communication is the illusion that it has taken place."

This is what user stories were supposed to accomplish in a more lightweight way.

The whole scrum DoR (definition of ready) status means that something is clear and ready for development.

Stories are written and are sent to the engineering team for clarification. This is where the comments are supposed to come in. There is a clear step for clarification of stories, before the story is ready for development. It gets marked as DoR when that clarification is done.

It does not matter if you use RFCs, user stories, or hallway conversations as your process of clarifying work. If it does not work, it does not work.

Any way you can get your teams to communicate more clearly is great.

  • > "The single biggest problem in communication is the illusion that it has taken place."

    Love this! Corollary: when you have too many meetings, that’s easy to notice. When you don’t have enough meetings, that’s harder to notice.

    I’m in the process of carefully adding meetings and process to our small team of 6 (we had a PM from a large company drop in a few years ago and haphazardly add a bunch of process, and it didn’t really help).

    We’re fully remote and have a daily huddle and, on average, 1 hour of meetings a week. It turns out this isn’t enough. So far, each bit of communication we’ve added has resulted in better outcomes and higher morale because we feel more like a team.

> The RFC approach has several advantages over verbal alignment. First of all, it is more precise. The need to write forces the author to clearly structure their thoughts into a coherent logical narrative. While writing, the author has time to examine their proposed solution from different angles and clearly see pros and cons of it.

> Another advantage of the document over verbal explanation is that a well-written RFC leaves little room for misinterpretation. It can include diagrams, examples, or calculations to illustrate and support the idea.

> Finally, we can return and reread the RFC later. Human memory is unreliable; already after a day, details that were crystal clear in one’s mind start to get blurry. When these details are written down, it is easy to review them at any time.

‘You have to write things down, because spoken words disappear into the air,’ was one of the first bits of feedback I received in my teacher training.

> The most common objection is that writing proposals is “a waste of time” compared to writing code.

The extra time spent writing is actually spent thinking.

  • >> The most common objection is that writing proposals is “a waste of time” compared to writing code.

    > The extra time spent writing is actually spent thinking.

    Common theme for decades is "we can save a few days of planning with just a few weeks of programming".

    But then there's the darker realization that sometimes the people you are working for are incapable of reasoning about planning artefacts or understanding how the system will look or operate simply from a document. So you need to present the system in small iterative chunks and repeatedly re-align expectations with reality: Agile.

    And sometimes you genuinely need to do exploratory work which doesn't fit into a planning framework - actual research!

    • > sometimes the people you are working for are incapable of reasoning about planning artefacts or understanding how the system will look or operate simply from a document

      I’m wrestling with this now. Over my career I’ve seen a strong correlation between good writers and good software engineers, but not everyone fits this mold. Shorter cycles and more chances for communication and feedback are helpful here.

  • >> The most common objection is that writing proposals is “a waste of time” compared to writing code.

    > The extra time spent writing is actually spent thinking.

    Until someone decides that using ChatGPT to write your RFC is a good idea. Then you get something that looks great, but the person behind the prompt actually understands less.

    • "Eventually they realized that this was something they were going to have to sort out, and they passed a law decreeing that anyone who had to carry a weapon as part of his normal Silastic work (policemen, security guards, primary school teachers, etc.) had to spend at least forty five minutes every day punching a sack of potatoes in order to work off his or her surplus aggressions. For a while this worked well, until someone thought that it would be much more efficient and less time-consuming if they just shot the potatoes instead. This led to a renewed enthusiasm for shooting all sorts of things..."

      - Douglas Adams, "Life, the Universe, and Everything"

      (It took an unreasonably long time to find this quote!)

    • Oh I really worry about that. AI code at least needs to pass unit tests, but there's no way to prove that the ideas in an AI document make sense until you try them and run into issues. Writing is thinking. If you let a robot do it, you aren't.

    • I’m currently fighting the “don’t use Gemini to write internal documents” war at my company. It’ll be long and hard, but I think I’ll eventually prevail.

      Every time someone throws a document written by AI at me, it feels so disrespectful.

I've tried suggesting this for my team since there are constant complaints of lack of communication. However, the response to this is "we have Teams/Jira/Confluence", but 99% of Jira tickets have no comments for clarification, Confluence has articles that are out of date by 5 years and Teams is never used for clarifying requirements.

  • That's like me trying to get my son to wash his hair and him responding by saying "We have shampoo in the shower."

    I am right and my son is right, but his hair is still not washed.

    I became a more effective manager and a better father when I learned how to talk to him better.

    • Would you share how? Your comment leaves with a cliff-hanger.

      I also don't get your son's response at all. How is he contradicting you at all and how does that lead to unwashed hair?

      4 replies →

  • It's not the lack of communication. IMO, it's the lack of team culture. Keeping documentation up to date is something only the team could do. And it can't be solved by using Confluence/Wiki/mailing lists tools.

Been trying to decide whether adopting a traditional RFC process or Oxide's RFD (https://rfd.shared.oxide.computer/rfd/0001) would better suit my team. We're using ADRs at the moment but we've ended up mixing a discussion like process into it and review process and using ADRs more like RFCs/RFDs

We started doing quarterly RFC at Newscatcher, and it was a big game-changer. We're entirely remote.

I got this idea from Netflix's founder's book "No Rules Rules" (highly recommend it)

Overall, I think the main idea is: context is what matters, and RFC helps you get your (mine, I'm the founder) vision into people's heads a bit more. Therefore, people can be more autonomous and move on faster.

My current struggle is getting people to engage with my RFCs: that is, leaving comments. I certainly know they're not perfect, and I'd much rather know that before going into a meeting where we discuss the RFC. Especially so I can do any asynchronous research, or just spend some damned time thinking about it.

That being said, this is improving in my team. And I think the stakeholders above me appreciate it.

Linear should take notes. In my previous consulting and termed engagements the lack of a good standardized spec templates has been a problem. Of course we ultimately made those but it took much more in-fighting than it should have. If the platform already offers a few baked templates then discussions are much quicker resolved. Same as auto-formatting of code; metric tons of tiresome bikeshedding disappear almost overnight.

My favorite section in an RFC is "alternatives considered". I've seen many times that an option that had been initially discarded became the solution after review and discussion. It's also a great way to answer later questions about "why didn't you do X instead".

We use them internally. Works well but its not a replacement for a business requirement specification, however, it can help in the creation of a (better) one with greater buy-in from all stakeholders

This seems to be a worthwhile exercise to explore -- formal but lightweight, and a lodestone during development.

That said, a ticketing system should ostensibly offer this same effect, but in my experience they're often populated with brief titles and maybe a short paragraph on expectations.

  • We usually use this before tickets are even created. In many cases, an RFC can lead to an epic-level ticket that includes multiple user stories.

    In other cases, it can be very tactical. I think you are right, probably it can be expressed directly in the form of a ticket. The discussion well happen in the comments to the ticket in this case.

    • A benefit of the RFC approach is if it lives in the repo itself the documentation is along side the code.

      Anything that will prompt stakeholders to engage and clarify expectations is a win. It's hard to make this happen even when everybody is philosophically aligned on the need to do so, so reframing it as an RFC could be quite the useful "One Weird Trick" ;-).

  • Ticketing systems inevitably get Goodharted. Everyone starts in good faith, then the manager starts using the number of tickets closed, touched, etc as a proxy for work being done, then the agents replace number of tickets with things actually accomplished.

  • > That said, a ticketing system should ostensibly offer this same effect

    If the ticketing system were private to the engineers, it probably would serve the same purpose. But inevitably ticketing systems have wider visibility, and become a mechanism for signalling progress to management/product/sales... at which point engineers are actively incentivised not to put actual data in the ticketing system

  • > a short paragraph on expectations.

    or worse, multiple bulleted paragraphs fulls of expectations that have little to do with how the software actually work and is used by the actual users.

My interpretation is that this had little to do with specifics of RFCs, and everything to do with the verbal culture described.

I don’t care what format you use in terms of both process and also literal format. Just write shit down.

This guy invented a spec?

  • In our org, an RFC precedes a tech spec. The RFC literally is the "let's formally talk about this before we nail down a specification". For smaller specs, annotated comments can serve this purpose. Before this process, what we had found was no one was paying attention to tech discussions in our eng slack channels. Having an RFC gave us an inflection point where we could point back and say, "an official discussion happened, we decided to move forward with a spec".

Write a spec, or the developers will write one for you, in a much less clear language.

  • It's not a foregone conclusion. I've had great success writing BDD-style literate tests (not using a framework, just comments that follow a particular grammar). They've allowed verbally negotiating the approximate expected specification, and dealing with the realities of what comes up once you start writing code. But you end up with a spec, expressed as a number of BDD scenarios.

    Obviously this does't work for all software. But it suits systems with end-to-end behaviour and sequences of actions.

    So it's possible to leave it to developers, but expect a high quality spec along with the code.