Comment by alamb
3 days ago
I think you would pick DataFusion over DuckDB if you want to customize it substantially. Not just with user defined functions (which are quite easy to write in DataFusion and are very fast), but things like * custom file formats (e.g. Spiral or Lance) * custom query languages / sql dialects * custom catalogs (e.g. other than a local file or prebuilt duckdb connectors) * custom indexes (read only parts of parquet files based on extra information you store) * etc.
If you are looking for the nicest "run SQL on local files" experience, DuckDB is pretty hard to beat
Disclaimer: I am the PMC chair of DataFusion
There are some other interesting FAQs here too: https://datafusion.apache.org/user-guide/faq.html
No comments yet
Contribute on Hacker News ↗