Comment by palata

4 days ago

I have been in a similar situation, and gRPC feels heavy. It comes with quite a few dependencies (nothing compared to npm or cargo systems routinely bringing hundreds of course, but enough to be annoying when you have to cross-compile them). Also at first it sounds like you will benefit from all the languages that protobuf supports, but in practice it's not that perfect: some python package may rely on the C++ implementation, and therefore you need to compile it for your specific platform. Some language implementations are just maintain by one person in their free time (a great person, but still), etc.

On the other hand, I really like the design of Cap'n Proto, and the library is more lightweight (and hence easier) to compile. But there, it is not clear on which language implementation you can rely other than C++. Also it feels like there are maintainers paid by Google for gRPC, and for Cap'n Proto it's not so clear: it feels like it's essentially Cloudflare employees improving Cap'n Proto for Cloudflare. So if it works perfectly for your use-case, that's great, but I wouldn't expect much support.

All that to say: my preferred choice for that would technically be Cap'n Proto, but I wouldn't dare making my company depend on it. Whereas nobody can fire me for depending on Google, I suppose.

> it feels like it's essentially Cloudflare employees improving Cap'n Proto for Cloudflare.

That's correct. At present, it is not anyone's objective to make Cap'n Proto appeal to a mass market. Instead, we maintain it for our specific use cases in Cloudflare. Hopefully it's useful to others too, but if you choose to use it, you should expect that if any changes are needed for your use case, you will have to make those changes yourself. I certainly understand why most people would shy away from that.

With that said, gRPC is arguably weird in its own way. I think most people assume that gRPC is what Google is built on, therefore it must be good. But it actually isn't -- internally, Google uses Stubby. gRPC is inspired by Stubby, but very different in implementation. So, who exactly is gRPC's target audience? What makes Google feel it's worthwhile to have 40ish(?) people working on an open source project that they don't actually use much themselves? Honest questions -- I don't know the answer, but I'd like to.

(FWIW, the story is a bit different with Protobuf. The Protobuf code is the same code Google uses internally.)

(I am the author of Cap'n Proto and also was the one who open sourced Protobuf originally at Google.)

  • My most vivid gRPC experience is from 10 years or so ago, so things have probably changed. We were heavily Go and micro-services. Switched from, IIRC, protobuf over HTTP, to gRPC "as it was meant to be used." Ran into a weird, flaky bug--after a while we'd start getting transaction timeouts. Most stuff would get through, but errors would build and eventually choke everything.

    I finally figured out it was a problem with specific pairs of servers. Server A could talk to C, and D, but would timeout talking to B. The gRPC call just... wouldn't.

    One good thing is you do have the source to everything. After much digging through amazingly opaque code, it became clear there was a problem with a feature we didn't even need. If there are multiple sub-channels between servers A and B. gRPC will bundle them into one connection. It also provides protocol-level in-flight flow limits, both for individual sub-channels and the combined A-B bundle. It does it by using "credits". Every time a message is sent from A to B it decrements the available credit limit for the sub-channel, and decrements another limit for the bundle as a whole. When the message is processed by the recipient process then the credit is added back to the sub-channel and bundle limits. Out of credits? Then you'll have to wait.

    The problem was that failed transactions were not credited back. Failures included processing time-outs. With time-outs the sub-channel would be terminated, so that wasn't a problem. The issue was with the bundle. The protocol spec was (is?) silent as to who owned the credits for the bundle, and who was responsible for crediting them back in failure cases. The gRPC code for Go, at the time, didn't seem to have been written or maintained by Google's most-experienced team (an intern, maybe?), and this was simply dropped. The result was the bundle got clogged, and A and B couldn't talk. Comm-level backpressure wasn't doing us any good (we needed full app-level), so for several years we'd just patch new Go libraries and disable it.

  • While you are correct the majority of Google services internally are Stubby-based, it isn't correct to say gRPC is not utilized inside Google. Cloud and external APIs are an obvious use case but also internal stuff that are also used in open source world use gRPC even when deployed internally. TensorFlow etc comes to mind...

    So "not used much" at Google scale probably justifies 40 people.

  • > What makes Google feel it's worthwhile to have 40ish(?) people working on an open source project that they don't actually use much themselves? Honest questions -- I don't know the answer, but I'd like to.

    It's at least used for the public Google Cloud APIs. That by itself guarantees a rather large scale, whether they use gRPC in prod or not.

If you are looking for a lightweight Protobuf based RPC framework, check out https://connectrpc.com/. It is part of the CNCF and is used and supported by multiple companies: https://buf.build/blog/connect-rpc-joins-cncf

gRPC ships with its own networking stack, which is one reason why those libs are heavy. Connect libraries leverage each ecosystem's native networking stack (e.g. net/http in Go, NSURLSession in Swift, etc.), which means any other libraries that work with the standard networking stack interop well with Connect.