Comment by PessimalDecimal
10 days ago
> Everything was good in the begining, as long as everyone submits their .proto to a centralized repo. Once the one team starts to host their own, things get broken quickly.
Is this an issue with protobufs per se though? It's a data schema. How are people supposed to develop to a shared schema if a team doesn't - you know - share their schema? That could happen with any other particular choice for how schemas are defined.
It's a problem with PB because it requires everything to be typed (unless you use Any), which requires all middleware to eagerly type check all data passing through. With JSON, validation will be typically done only by the endpoints, which allows for much faster development.
There was a blog a few years ago, where an engineer working on the Google Cloud console was complaining that simply adding a checkbox to one of the pages required modifying ~20 internal protos and 6 months of rollout. That's an obvious downside that I wish I knew how to fix.
My guess is there's more to that story than just "protobufs don't forward unknown fields" because that's not how they work be default. Take a look at https://protobuf.dev/programming-guides/proto3/#unknowns.
https://kmcd.dev/posts/protobuf-unknown-fields/ discusses the scenario you're hinting at.
It's possible in the story you mention that each of those ~20 internal protos were different messages, and each hop between backends was translating data between nearly identical schemas. In that case, they'd all need to be updated to transport that data. But that's different and the result of those engineers' choice for how to structure their service definitions.
The problem is different. Protobuf's unknown field support is useful if you want to forward a message in its entirety, and it allows you to copy an input message even though it has fields unknown to the middleware. The problem arises because at Google, in order to minimize payloads and storage sizes, they almost always create "intermediate" protobufs that are only used by middleware to talk to other middleware.
Example:
The service that manages the web frontend knows that the new checkbox is auth-related and therefore it has to go into the WebServiceAuthRequest PB message, but it doesn't have the new schema of the WebServiceAuthRequest message with the checkbox field, so it can't create a WebServiceAuthRequest message because it doesn't know which numeric ID to use for the value.
The "common wisdom" at Google was that you have to add a new field starting at the leaves (the storage backends) and work your way up to the middleware, then the web frontends and finally the JS code. And yes, in the worst case it can take two quarters and modifying 20 intermediate services (each with its own ServiceFooRequest protobuf) just to add a new checkbox in the UI.
And in writing this I came up with a way to avoid the problem, but it would require an incompatible change to the PB wire format. Hmmm...