Comment by zem

1 year ago

> the rule at google is if you check any code in, you are responsible for every single breakage it causes

i can no longer edit my post to clarify this, but by "responsible for breakages" i meant that if your new check-in caused any CI tests anywhere within the codebase to fail, even if due to bugs in the other code, you had to stop and fix it, or get the owners to fix it, or find some principled way to temporarily disable those tests, before you could check your code in.

this was a very real issue for things like the python runtime or widely used code checkers like pytype or pylint, because if e.g. the new version of python, or some improved check in pytype, started raising failures in a code pattern that was technically wrong but which the existing toolchain did not complain about, you could not release the new version until you had fixed all those new breakages.

contrast this with non-monorepo codebases, where the other code would have had to deal with the fact that "oh, our code works under 3.10 but 3.11 broke it, guess we have to pin our own repo to 3.10 until we fix it", but as the python team we could just say "we support 3.11 now, you need to catch up"

5 comments

zem

Arainach 1 year ago

> i can no longer edit my post to clarify this, but by "responsible for breakages" i meant that if your new check-in caused any CI tests anywhere within the codebase to fail, even if due to bugs in the other code, you had to stop and fix it, or get the owners to fix it, or find some principled way to temporarily disable those tests, before you could check your code

Google engineers hate this one weird hack: just send a mail saying "we're deleting/deprecating [everything your product depends on] in [14-180] days, good luck!"

pjc50 1 year ago
I'm guessing these are related. If every team is potentially responsible for breakages in every product, there's no such thing as a "just chugging along" product and there will be a constant demand to delete products that are not popular within Google.
- Arainach 1 year ago
  
  It has nothing to do with popularity, it has everything to do with teams doing what get them promoted and not having to deal with cross org fallout.
  At Microsoft, if there were 12 teams depending on service foo, the team who owned foo had to have a deprecation plan that included sufficient time and support for all 12 teams to get off foo, and directors/VPs would meet and come to terms before the deprecation started.
  Google is the opposite. Despite an increasing trend towards top-down direction there's no requirement or support for involving stakeholders. There can be internal products (let's say a data storage solution) and the team who owns it can say "we don't care about this anymore, it's being shut down in 6 months" even if dozens of other teams and millions of end users rely on it, and it's up to those teams to scramble, put other projects on hold, and try to migrate as fast as possible to avoid an outage.
  It's frankly demotivating. For the last 18 months, a huge percentage of eng time on my team has gone to either compliance or mandatory migration work i.e. stuff that ABSOLUTE BEST CASE customers don't notice.
mathstuf 1 year ago

Don't forget to include the standard detailed changelog: "bugfixes and performance improvements".
lbruno 1 year ago

this reminds me of kenton v's complaint about working on infra being so hard