Working on complex systems: What I learned working at Google

5 months ago (thecoder.cafe)

146 comments

0xKelsey

One of my pet peeves with the usage of complex(ity) out of the traditional time/space in computer science is that most of the time the OPs of several articles over the internet do not make the distinction between boundaried/arbitrary complexity, where most of the time the person has most of the control of what is being implemented, and domain/accidental/environmental complexity, which is wide open and carries a lot of intrinsic and most of the time unsolvable constraints.

Yes, they are Google; yes, they have a great pool of talent around; yes, they do a lot of hard stuff; but most of the time when I read those articles, I miss those kinds of distinctions.

Not lowballing the guys at Google, they do amazing stuff, but in some domains of domain/accidental/environmental complexity (e.g. sea logistics, manufacturing, industry, etc.) where most of the time you do not have the data, I believe that they are way more complex/harder than most of the problems that the ones that they deal with.

kubb 5 months ago
I’d wager 90% time spent at Google is fighting incidental organizational complexity, which is virtually unlimited.
- repeekad 5 months ago
  
  The phrase thrown around was “collaboration headwind”, the idea was if project success depends on 1 person with a 95% chance of success, project success also had a 95% chance. But if 10 people each need to succeed at a 95% chance, suddenly the project success likelihood becomes 60%…
  In reality, lazy domain owners layered on processes, meetings, documents, and multiple approvals until it took 6 months to change the text on a button, ugh
  
  32 replies →
- Demiurge 5 months ago
  
  And when you’re at a smaller company 90% of your time is fighting societal complexity, limit of which also approaches infinity, but at a steeper angle.
  No greater Scott’s man can tell you that the reality is surprisingly complex, and sometimes you have resources to organize and fight them, and sometimes you use those resources wiser than the other group of people, and can share the lessons. Sometimes, you just have no idea if your lesson is even useful. Let’s judge the story on its merits and learn what we can from it.
  
  4 replies →
- simianwords 5 months ago
  
  Equally important is the amount of time they save because of available abstractions to use like infra, tooling etc
  
  1 reply →
tuyiown 5 months ago
I think this is addressed with the complex vs complicated intro. Most problems with uncontrolled / uncontrollable variables will be approached with an incremental solution, e.g. you'll restrict those variables voluntarily or involuntarily and let issues being solved organically / manually, or automatisation will be plain and simple being abandoned.
This qualify as complicated. Delving in complicated problems is mostly driven by business opportunity, always has limited scaling, and tend to be discarded by big players.
- braza 5 months ago
  
  I don't think this is adequately addressed by the "complicated vs. complex" framing—especially not when the distinction is made using reductive examples like taxes (structured, bureaucratic, highly formalized) versus climate change (broad, urgent, signaling-heavy).
  That doesn’t feel right.
  Let me bring a non-trivial, concrete example—something mundane: “ePOD,” which refers to Electronic Proof of Delivery.
  ePOD, in terms of technical implementation, can be complex to design for all logistics companies out there like Flexport, Amazon, DHL, UPS, and so on.
  The implementation itself—e.g., the box with a signature open-drawing field and a "confirm" button—can be as complex as they want from a pure technical perspective.
  Now comes, for me at least, the complex part: in some logistics companies, the ePOD adoption rate is circa 46%. In other words, in 54% of all deliveries, you do not have a real-time (not before 36–48 hours) way to know and track whether the person received the goods or not. Unsurprisingly, most of those are still done on paper. And we have:
  - Truck drivers are often independent contractors.
  - Rural or low-tech regions lack infrastructure.
  - Incentive structures don’t align.
  - Digitization workflows involve physical paper handoffs, WhatsApp messages, or third-party scans.
  So the real complexity isn't only "technical implementation of ePOD" but; "having the ePOD, how to maximize it's adoption/coverage with a lot uncertainty, fragmentation, and human unpredictability on the ground?".
  That’s not just complicated, it’s complex 'cause we have: - Socio-technical constraints,
  - Behavioral incentives,
  - Operational logistics,
  - Fragmented accountability,
  - And incomplete or delayed data.
  We went off the highly controlled scenario (arbitrarily bounded technical implementation) that could be considered complicated (if we want to be reductionist, as the OP has done), and now we’re navigating uncertainty and N amount of issues that can go wrong.
  
  1 reply →
- __MatrixMan__ 5 months ago
  
  I don't think it is, because the intro gets it wrong. If a problem's time or space complexity increases from O(n^2) to O(n^3) there's nothing necessarily novel about that, it's just... more.
  Complicated on the other hand, involves the addition of one or more complicating factors beyond just "the problem is big". It's a qualitative thing, like maybe nobody has built adequate tools for the problem domain, or maybe you don't even know if the solution is possible until you've already invested quite a lot towards that solution. Or maybe you have to simultaneously put on this song and dance regarding story points and show continual progress even though you have not yet found a continuous path from where you are to your goal.
  Climate change is both, doing your taxes is (typically) merely complex. As for complicated-but-not-complex, that's like realizing that you don't have your wallet after you've already ordered your food: qualitatively messy, quantitatively simple.
  To put it differently, complicated is about the number of different domains you have to consider, complex is about--given some domain--how difficult the consideration in that domain are.
  Perhaps the author's usage is common enough in certain audiences, but it's not consistent with how we discuss computational complexity. Which is a shame since they are talking about solving problems with computers.
rawgabbit 5 months ago

If you consider their history of killing well loved products and foisting unwarranted products such as Google Plus onto customers, Google is for lack of a better word just plain stupid. Google is like a person with an IQ of 200 but would get run over by oncoming traffic because they have zero common sense.
williamdclt 5 months ago

I've not seen "accidental" complexity used to mean "domain" (or "environmental" or "inherent") complexity before. It usually means "the complexity you created for yourself and isn't fundamental to the problem you're solving"
tanelpoder 5 months ago

Also, anything you do with enterprise (cloud) customers. People like to talk about scale a lot and data people tend to think about individual (distributed) systems that can go webscale. A single system with many users is still a single system. In enterprise you have two additional types of scale:
1) scale of application variety (10k different apps with different needs and history)
2) scale of human capability (ingenuity), this scale starts from sub-zero and can go pretty high (but not guaranteed)
mwbajor 5 months ago
Im a HW engineer and don't really understand "complexity" as far as this article describes it. I didn't read it in depth but it doesn't really give any good examples with specifics. Can someone give a detailed example of what the author is really talking about?
- junto 5 months ago
  
  Cynefin framework:
  https://en.m.wikipedia.org/wiki/Cynefin_framework
- aweiher 5 months ago
  
  System Thinking 101
TexanFeller 5 months ago

Rich Hickey is famous for talking about easy vs. simple/complex and essential vs. incidental complexity.
“Simple Made Easy”: https://youtu.be/SxdOUGdseq4?si=H-1tyfL881NawCPA

dmoy 5 months ago

> My immediate reaction in my head was: "This is impossible". But then, a teammate said: "But we're Google, we should be able to manage it!".

Google, where the impossible stuff is reduced to merely hard, and the easy stuff is raised to hard.

dijit 5 months ago
This is probably the most accurate statement possible.
“I just want to store 5TiB somewhere”
“Ha! Did you book multiple bigtable cells”
https://youtu.be/3t6L-FlfeaI?si=C5PJcrvLepABZsVF
- Phelinofist 5 months ago
  
  What are peer-bonuses?
  
  25 replies →
cmrdporcupine 5 months ago

Or "How many MDB groups do I need to get approved to join over multiple days/weeks, before I can do the 30 second thing I need to do?"
Do not miss
newsclues 5 months ago
“the difficult we do immediately. The impossible takes a little longer” WW2 US army engineer corp
- fuzzfactor 5 months ago
  
  >“the difficult we do immediately. The impossible takes a little longer”
  This was posted in my front office when I started my company over 30 years ago.
  It was a no-brainer, same thing I was doing for my employer beforehand. Experimentation.
  By the author's distinction in the terminology, if you consider the complexity relative to the complications in something like Google technology, it is on a different scale compared to the absolute chaos relative to the mere remaining complexity when you apply it to natural science.
  I learned how to do what I do directly from people who did it in World War II.
  And that was when I was over 40 years younger, plus I'm not done yet. Still carrying the baton in the industrial environment where the institutions have a pseudo-military style hierarchy and bureaucracy. Which I'm very comfortable working around ;)
  Well, the army is a massive mainstream corp.
  There are always some things that corps don't handle very well, but generals don't always care, if they have overwhelming force to apply, lots of different kinds of objectives can be overcome.
  Teamwork, planning, military-style discipline & chain-of-command/org-chart, strength in numbers, all elements which are hallmarks of effective armies over the centuries.
  The engineers are an elite team among them. Traditionally like the technology arm, engaged to leverage the massive resources even more effectively.
  The bigger the objective, the stronger these elements will be brought to bear.
  Even in an unopposed maneuver, steam-rolling all easily recognized obstacles more and more effectively as they up the ante, at the same time bigger and bigger unscoped problems accumulate which are exactly the kind that can not be solved with teamwork and planning (since these are often completely forbidden). When there must be extreme individual ability far beyond that, and it must emanate from the top decision-maker or have "equivalent" access to the top individual decision-maker. IOW might as well not even be "in" the org chart since it's just a few individuals directly attached to the top square, nobody's working for further promotions or recognition beyond that point.
  When military discipline in practice is simply not enough discipline, and not exactly the kind that's needed by a long shot.
  That's why even in the military there are a few Navy Seals here and there, because sometimes there are serious problems that are the kind of impossible that a whole army cannot solve ;)
- brap 5 months ago
  
  “and the easy... well, that’s not a good promo artifact, so never”

gilbetron 5 months ago

I've interviewed many current and ex Googlers, and one thing we've discovered is that we have to be careful overindexing on the scale and complexity of systems they work on. Google is insanely huge and complex, but have insane and complex tooling to help developers. "I worked on a project that affected 250 million users" is something we'll here and sounds amazing, but in reality, from their perspective, they spent months working through the complex Google dev, QA, and deployment process and pushed out a relatively straightforward change, but that change was for a massive system.

They have a unique and distinctive experience, but it usually isn't what you expect. It is rare to encounter someone from Google that actually built something of significance, and those that have are always at the staff+ level and had been there 10+ years.

If I were to make another generalization, the [g|x]ooglers that worked on relatively "small" projects are often the most interesting, as they had resources to build something from the ground up and do attempt some really interesting projects.

neilv 5 months ago

> My immediate reaction in my head was: "This is impossible". But then, a teammate said: "But we're Google, we should be able to manage it!".

"We can do it!" confidence can be mostly great. (Though you might have to allow for the possibility of failure.)

What I don't have a perfect rule for is how to avoid that twisting into arrogance and exceptionalism.

Like, "My theory is correct, so I can falsify this experiment."

Or "I have so much career potential, it's to everyone's advantage for me to cheat to advance."

Or "Of course we'll do the right thing with grabbing this unchecked power, since we're morally superior."

Or "We're better than those other people, and they should be exterminated."

Maybe part of the solution is to respect the power of will, effort, perseverance, processes, etc., but to be concerned when people don't also respect the power and truth of humility, and start thinking of individual/group selves as innately superior?

yodsanklai 5 months ago
Sorry to say that, but this sounds a bit like a fantasy. I think the vast majority of Google employees don't see themselves as particularly brillant or special. Even there, lots of people have imposter syndrome.
Actually, I've found this is a constant in life, whatever you achieve, you end up in a situation where you're pretty average among your peers. You may feel proud to get into Google for a few months, and then you're quickly humbled down.
- neilv 5 months ago
  
  Understood, but I meant to ask about the more general problem -- not specific to Google, only prompted by that quote.
  (Also, to be clear about my examples: I don't think Google is fabricating their dissertation research, nor do I think Google is genocidal.)
  If you're suggesting there's a lot of humility, yet "But we can do it!" still works, that's great, and I'd be interested in more nuances of that.

RenThraysk 5 months ago

There is a certain amount of irony when the cookie policy agreement is buggy on a story about complicated & complex systems.

Clicking on "Only Necessary" causes the cookie policy agreement to reappear.

amdivia 5 months ago

Had the same issue
wooque 5 months ago
Same here, it's because you have third party cookies blocked.
- junto 5 months ago
  
  My assumption with bugs like this is that they are “geographically based edge cases” that have been poorly tested due to engineers not being in the right location to test it, but affects a large number of users without throwing a error that can be logged.
  GDPR banner only to be used in EU, with conditional of only accepting non-essential cookies, and the engineer or QA is based in the U.S.
  As a side note, as someone that lives in the EU my pattern of usage here is:
  - choose only non-essential, but if not a presented option then
  - reject all cookies, but if no reject all available then
  - switch the reader mode (or hide distracting items), or if not possible then
  - close tab
  I’m getting much more aggressive when dealing with cookie banners dark patterns. I will not a third third party advertising cookies as much as possible and support websites that allow me an easy way to opt out of them.
jajko 5 months ago

Not for me, on Chrome now
CommenterPerson 5 months ago

It didn't appear on DuckDuckGo either, Thanks.
nonethewiser 5 months ago

I dont see a cookie banner. Thankfully.

ggm 5 months ago

I think there are two myths applicable here. Probably more.

One myth is that complex systems are inherently bad. Armed forces are incredibly complex. That's why it can take 10 or more rear echelon staff to support one fighting soldier. Supply chain logistics and materiel is complex. Middle ages wars stopped when gunpowder supplies ran out.

Another myth is that simple systems are always better and remain simple. They can be, yes. After all, DNA exists. But some beautiful things demand complexity built up from simple things. We still don't entirely understand how DNA and environment combine. Much is hidden in this simple system.

I do believe one programming language might be a rational simplification. If you exclude all the DSL which people implement to tune it.

jcranmer 5 months ago

> Middle ages wars stopped when gunpowder supplies ran out.
The arquebus is the first mass gunpowder weapon, and doesn't see large scale use until around the 1480s at the very, very tail end of the Middle Ages (the exact end date people use varies based on topic and region, but 1500 is a good, round date for the end).
In Medieval armies, your limiting factor is generally that food is being provided by ransacking the local area for food and that a decent portion of your army is made up of farmers who need to be back home in the harvest season. A highly competent army might be able to procure food without acting as a plague on all the local farmlands, but most Medieval states lacked sufficient state capacity to manage that (in Europe, essentially only the Byzantines could do that).
zmb_ 5 months ago
Following the definition from the article, armed forces seems like a complicated system, not a complex one. There is a structured, repeatable solution for armed forces. It does not exhibit the hallmark characteristics of complex systems listed in the article like emergent behaviors.
- cowboylowrez 5 months ago
  
  not a fan of the article for this reason alone. good points made, but no reason to redefine perfectly good words when we already have words that work fine.
p_v_doom 5 months ago

Agreed. The problem is not complexity. Every system must process a certain amount of information. And the systems complexity must be able to match that amount. The fundamental problem is about designing systems that can manage complexity, especially runaway complexity.
jajko 5 months ago

> Middle ages wars stopped when gunpowder supplies ran out
Ukraine would be conquered by russia rather quickly if russians weren't so hilariously incompetent in these complex tasks, and war logistics being the king of them. Remember that 64km queue of heavy machinery [1] just sitting still? This was 2022, and we talk about fuel and food, the basics of logistics support.
[1] https://en.wikipedia.org/wiki/Russian_Kyiv_convoy

destring 5 months ago

I come from a math background so it’s a bit surprising when software engineers don’t have such basic that systems modeling vocabulary. Everyone should give Thinking in Systems by Meadows a read

Pavilion2095 5 months ago

The cookie banner reappears indefinitely on this website when I click 'only necessary' lol.

teivah 5 months ago
Sorry about that, I'm my newsletter provider (Substack) which is very buggy sometimes.
- romanovcode 5 months ago
  
  Probably because it is overly complex system.
  
  2 replies →
nonethewiser 5 months ago
Thankfully I dont see a cookie banner at all. Did you try moving continents?
- EasyMark 5 months ago
  
  or ublock and adding all the cookie blocker lists

kossTKR 5 months ago

  "This is one possible characteristic of complex systems: they behave in ways that can hardly be predicted just by looking at their parts, making them harder to debug and manage."

To be honest this doesn't sound too different from many smaller and medium sized internetprojects i've worked on, because of the asynchronous nature of the web, with promises, timing issues and race conditions leading to weirdness that's pretty hard to debug because you have to "playback" with the cascading randomness of request timing, responses, encoding, browser/server shenanigans etc.

tunesmith 5 months ago

I think the definitions of complex/complicated get muddled with the question of whether something is truly a closed system. Often times something is defined as "complex" when all they mean is that their model doesn't incorporate the externalities. But I don't know if I've come across a description of a truly closed system that has "emergent behavior". I don't know if LLMs qualify.

gilleain 5 months ago

Mostly overlapping definition of what a 'complex system' is with :

https://en.wikipedia.org/wiki/Complex_system

although I understood the key part of a system being complex (as opposed to complicated) is having a large number of types of interaction. So a system with a large number of parts is not enough, those parts have to interact in a number of different ways for the system to exhibit emergent effects.

Something like that. I remember reading a lot of books about this kind of thing a while ago :)

ninetyninenine 5 months ago

Except computers attempt to model mathematics in an ideal world.

Unless your problem comes from something side effects on a computer that can’t be modeled mathematically there is nothing technically stopping you from modeling the problem as mathematical problem then solving that problem via mathematics.

Like the output of the LLM can’t be modeled. We literally do not understand it. Are the problems faced by the SRE exactly the same? You give a system an input of B and you can’t predict the output of A mathematically? It doesn’t even have to be a single equation. A simulation can do it.

ratorx 5 months ago
I think the vast majority of SRE problems are in the “side effects” category. But higher level than the hardware-level side effects of the computer that you might be imagining.
The core problem is building a high enough fidelity model to simulate enough of the real world to make the simulation actually useful. As soon as you have some system feedback loops, the complexity of building a useful model skyrockets.
Even in “pure” functions, the supporting infrastructure can be hard to simulate and critical in affecting the outputs.
Even doing something simple like adding two numbers requires an unimaginable amount of hidden complexity under the hood. It is almost impossible for these things to not have second-order effects and emergent behaviour under enough scale.
- ninetyninenine 5 months ago
  
  Can you give me an example of some problem that emerged that was absolutely unpredictable.

quantum_state 5 months ago

The mentioned characteristics and more of complex systems have been known and studied since about half century ago in physics and biology. The deal with large organizations or large code bases is yet another example.

polotics 5 months ago

I think you are using hysteresis when actually meaning more general path-dependency.

carom 5 months ago

There are typos and rough grammar in the first few paragraphs and I am actually very happy about that because I know I'm not reading LLM slop.

Zoethink 5 months ago

[dead]

CommenterPerson 5 months ago

Interesting, Thanks to the writer.

However, all this amazing stuff in the service of .. posting ads ?

bitpush 5 months ago

Doctors are doing surgery .. for earning money?
Dont be reductive.
EasyMark 5 months ago

Google does a lot besides posting ads...

hiddencost 5 months ago

This is all exacerbated by a ton of the ML stack being in Python, for some god Forsaken reason.

bitpush 5 months ago

How is the choice of language the cause of anything complex/complicated?
Both python and rust (for instance) are both turing complete, and equally capable

nottorp 5 months ago

Let's add a post scriptum:

Whatever you're working on, your project is not likely to be at Google's scale and very unlikely to be a "complex system".

pentaphobe 5 months ago
Let's add a post post scriptum :)
Just because your project might not be at Google's scale doesn't mean it is therefore also not complex [^1]
Example: I'd say plenty of games fit the author's definition of "complex systems". Even the well-engineered ones (and even some which could fit on a floppy disc)
[1]: https://en.m.wikipedia.org/wiki/Affirming_the_consequent
- globalnode 5 months ago
  
  Speaking of games, why hasn't google made a game. They could create a gaming division and well... make one. Amazon did. I wonder why they haven't.
  
  7 replies →
nasretdinov 5 months ago

IMO even a more interesting observation is that even Google itself doesn't necessarily work on large scale, e.g. many regionalised services in Google Cloud don't have _that_ many requests in each region, allowing for a much simpler architecture compared to behemoths like GMail or Maps
octo888 5 months ago

Don't underestimate my colleagues' abilities to turn the simple into the complex!
vendiddy 5 months ago
Managing complexity pays off sooner than one would think.
Even a project that's like 15k lines of code would benefit from a conscious effort to fight against complexity.
- 0xKelsey 5 months ago
  
  100%
p_v_doom 5 months ago

IMO what we term "complex" tends to be that which the current setup/system struggles to deal with or manage. Relatively speaking google has much much higher complexity, but it doesnt matter as much, because even in simpler cases we are dealing with huge amount of variety and possible states, and the principles of managing that remain the same regardless of scale.
citrin_ru 5 months ago
For small scale one can build a simple system but I see many are trying to copy FAANG architecture anyway. IMHO it’s a fallacy - people think that if they’ll would copy architecture used by google their company will be successful like google. I think it other was around - google has to build complex systems because it has many users.
- nottorp 5 months ago
  
  Yes, it's called "cargo cult" and it applies to a lot of architecture and processes decisions in IT :)
- ungreased0675 5 months ago
  
  It’s an infectious disease among developers. Some people would spend weeks making a simple landing page, and it would require at least 3 different cloud services.
  
  1 reply →
prmph 5 months ago

Complex is orthogonal to Large. Some small to medium scale systems address an incredibly complex problem space. Some large systems are solving relatively simple problems. Of course I do agree that size introduces it's own complexity.