Comment by gunnarmorling

17 days ago

I am working on a new Java parser for the Apache Parquet file format, with minimal dependencies and multi-threaded execution: https://github.com/hardwood-hq/hardwood.

Approaching the home stretch for a first 1.0 preview release, including: support for parsing Parquet files with flat and nested schemas, all physical and logical column types, core and advanced encodings, projections, compression, multi-threading, etc. all that with a pretty decent performance.

Next on the roadmap are SIMD support, predicate push-down (bloom filters, statistics, etc.), writer support.