Comment by jeffbee
5 years ago
Making a flagrantly wasteful data format and then using its bloated extent as the numerator in your benchmark will not exactly be a fair comparison. If a protobuf has a packed, repeated field that looks like \x0a\x02\x7f\x7f and json has instead { "myFieldNameIsBob": [ 127, 127 ] } the JSON interpreter has to be 20x faster just to stay even.
That's true, would be interesting to see an "encoded entities per second" comparison. Or maybe a comparison with mostly stringy data where the size is probably comparable.
Article author here. I agree that would be a very englightening benchmark. Protobuf can dump to JSON, so it shouldn't be too much work to dump my benchmark data to JSON and benchmark the parsing with simdjson. Maybe I'll see if I can get this done while this article is still on the front page. :)
Ah, wow, that's great. Apples-to-apples since you can dump the same data to JSON. And shows why HN remains a unique place. Wonder out loud, and maybe get an answer within an update to the article you just read :)
1 reply →