Comment by marginalia_nu
3 months ago
I migrated off apache parquet to a very simple columnar format. Cut processing times in half, reduced RAM usage by almost 90%, and (as it turns out) dodged this security vulnerability.
I don't want to make too harsh remarks about the project, as it may simply not have been the right tool for my use case, though it sure gave me a lot of issues.
What "very simple columnar format" did you switch to?
https://github.com/MarginaliaSearch/SlopData
Writeup about some of the ideas that went into it:
https://www.marginalia.nu/log/a_112_slop_ideas/