Comment by augusteo

1 month ago

> "So DuckDB was developed to allow queries for bigish data finally without the need for a cluster to simplify data analysis... and we now put it to a cluster?"

This is a fair point, but I think there's a middle ground. DuckDB handles surprisingly large datasets on a single machine, but "surprisingly large" still has limits. If you're querying 10TB of parquet files across S3, even DuckDB needs help.

The question is whether Ray is the right distributed layer for this. Curious what the alternative would be—Spark feels like overkill, but rolling your own coordination is painful.

0 comments

augusteo

No comments yet

Contribute on Hacker News ↗