Comment by esafak

8 hours ago

Any opinions on DuckLake?

I’ve had very good experience with it last year. I used it at large scale with data that had been in iceberg previously and it worked flawlessly. It’s only improved since. Highly recommend.

The problem space that ducklake solves is smaller, but it helped me to get a working metabase dashboard quickly on ~1tb of data with 128gb ram. Queries were much, much faster than all alternatives.

Some downsides are: No unique constraints with indexes (can accidentally shoot yourself in the foot with double ingestion), writing is a bit cumbersome if you already have parquet files.

With my enterprise hat on, I'd say Athena + S3 is good enough. Only use DuckDB for ad hoc analysis.