← Back to context

Comment by orthoxerox

16 hours ago

The idea is that you treat data storage and data processing as two distinct tasks. You have your data in S3 or HDFS or a local directory and you run DuckDB on whatever single-node compute you have: a local machine or a container in a cluster.

There are companies that write cluster computing engines with duckdb as the byte-cruncher at their heart, but usually it's more like NumPy, Pandas or Polars on steroids. Or SQLite, but for running OLAP queries.