Comment by gunnarmorling
17 days ago
I am working on a new Java parser for the Apache Parquet file format, with minimal dependencies and multi-threaded execution: https://github.com/hardwood-hq/hardwood.
Approaching the home stretch for a first 1.0 preview release, including: support for parsing Parquet files with flat and nested schemas, all physical and logical column types, core and advanced encodings, projections, compression, multi-threading, etc. all that with a pretty decent performance.
Next on the roadmap are SIMD support, predicate push-down (bloom filters, statistics, etc.), writer support.
No comments yet
Contribute on Hacker News ↗