← Back to context

Comment by tyingq

5 years ago

"Applying this technique to protobuf parsing has yielded amazing results: we have managed to demonstrate protobuf parsing at over 2GB/s, more than double the previous state of the art."

I was somewhat surprised to see that was state of the art for protobufs. Simdjson boasts faster throughput, without the benefit of the length encoded headers that are in protobufs. I looked for examples of protobufs using SIMD, but could only find examples of speeding up varint encode/decode.

Article author here. I dumped my test protobuf payload (7506 bytes) to JSON and got 19283 bytes.

So parsing this particular payload at 2GB/s would be equivalent to parsing the JSON version at 5.1GB/s.

SIMD doesn't end up helping protobuf parsing too much. Due to the varint-heavy nature of the format, it's hard to do very much SIMD parallelism. Instead our approach focuses on going for instruction-level parallelism, by trying to remove instruction dependencies as much as possible. With this design we get ~3.5 instructions per cycle in microbenchmarks (which represents a best case scenario).

Making a flagrantly wasteful data format and then using its bloated extent as the numerator in your benchmark will not exactly be a fair comparison. If a protobuf has a packed, repeated field that looks like \x0a\x02\x7f\x7f and json has instead { "myFieldNameIsBob": [ 127, 127 ] } the JSON interpreter has to be 20x faster just to stay even.

  • That's true, would be interesting to see an "encoded entities per second" comparison. Or maybe a comparison with mostly stringy data where the size is probably comparable.

    • Article author here. I agree that would be a very englightening benchmark. Protobuf can dump to JSON, so it shouldn't be too much work to dump my benchmark data to JSON and benchmark the parsing with simdjson. Maybe I'll see if I can get this done while this article is still on the front page. :)

      2 replies →

I don't know if this is a fair comparison, as 2GB/s of protobuf will parse a lot more information than 2GB/s of JSON will, since protobuf is a much more space-efficient way to encode your data.

  • See https://news.ycombinator.com/item?id=26934063 , we may get a fairly sound comparison. Though I imagine the comparison varies a lot depending on the data. As another comment mentioned, a message with a lot of large strings would lean heavily towards protobufs. And in that case, the size wouldn't be much different.

    • I would just measure the size of the resulting data structures. In a language like C/C++ it could be as simple as memory usage before your parsing function and after

Isn't there also a significant difference in what the input is being parsed to? My expectation is for a protobuf library to parse messages to structs, with names resolved and giving constant time field access. Simdjson parses a json object to an iterator, with field access being linear time and requiring string comparisons rather than just indexing to a known memory offset.

I.e. it seems like simdjson trades off performance at access time for making the parsing faster. Whether that tradeoff is good depends on the access pattern.

  • But the same could be true for protobuf. Decode fields only when you need them, and 'parse' just to find the field boundaries and cardinality. Did stuff like that for internal protobuf-like tool and with precomputed message profiles you can get amazing perf. Just get the last or first bit of most bytes (vgather if not on AMD) and you can do some magic.

You cannot compare them. Decoding JSON is compressing it essentially, and in order to compare you would need to look at the resulting data structures and how long it takes. strings are comparable, but an integer is bigger exp2( log10( digit_count ) ) I think.

But yeah, I had the same first thought too.

What benefit would length encoded headers provide other than to reduce payload size? With JSON you just have to scan for whitespace, whereas with protobuf you actually have to decode the length field.

  • Not having to look at every byte to split out the contents.

    • Right, a protobuf message that consists of a single 100MB string will be "decoded" dramatically faster than 2GB/s, because you only need visit the tag and the length and store the address and length, aliasing the input.

      It's really quite impossible to discuss data encoding performance outside of a given schema.