Comment by willvarfar

10 hours ago

Exciting!

I have been on a similar odyssey making a 'zero copy' Java library that supports protobuf, parquet, thrift (compact) and (schema'd) json. It does allocate a long[] and break out the structure for O(1) access but doesn't create a big clump of object wrappers and strings and things; internally it just references a big pool buffer or the original byte[].

The speed demons use tail calls on rust and c++ to eat protobuf https://blog.reverberate.org/2021/04/21/musttail-efficient-i... at 2+GB/sec. In java I'm super pleased to be getting 4 cycles per touched byte and 500MB/sec.

Currently looking at how to merge a fast footer parser like this into the Apache Parquet Java project.