← Back to context

Comment by nopurpose

12 hours ago

True zero-copy is not achievable with Protobuf, you need something like FlatBuffers for that. What is presented here is more like a zero-allocations.

I also find this misleading, and could be solved so easily by just explaining that of course varints need resolving and things will just happen lazily (presumably, I didn’t read the code) when they are requested to be read rather than eagerly.

Is this still true? New versions of protobuf allow codegen of `std::string_view` rather than `const std::string&` (which forces a copy) of `string` and `repeated byte` fields.

https://protobuf.dev/reference/cpp/string-view/

  • It allows avoiding allocations, but it doesn't allow using serialised data as a backing memory for an in-language type. Protobuf varints have to be decoded and written out somewhere. They cannot be lazily decoded efficiently either: order of fields in the serialised message is unspecified, hence it either need to iterate message over and over finding one on demand or build a map of offsets, which negates any wins zero-copy strives to achieve.

    • This is true but the relative overhead of this is highly dependent on the protobuf structure in one's schema. For example, fixed integer fields don't need to be decoded (including repeated fixed ints), and the main idea of the "zero copy" here is avoiding copying string and bytes fields. If your protobufs are mostly varints then yes they all have to be decoded, if your protobufs contain a lot of string/bytes data then most of the decoded overhead could be memory copies for this data rather than varint decoding.

      In some message schemas even though this isn't truly zero copy it may be close to it in terms of actual overhead and CPU time, in other schemas it doesn't help at all.

  • Those field accessors take and return string_view but they still copy. The official C++ library always owns the data internally and never aliases except in one niche use case: the field type is Cord, the input is large and meets some other criteria, and the caller had used kParseWithAliasing, which is undocumented.

    To a very close approximation you can say that the official protobuf C++ library always copies and owns strings.

    • Well that is very disappointing news.

      Even the decoder makes a copy even though it's returning a string_view? What's the point then.

      I can understand encoders having to make copies, but not in a decoder.