← Back to context

Comment by steve_adams_86

16 hours ago

The most interesting use case lately has been using it as the transformation and validation engine for a CLI that handles scientific data. Some datasets are small and could have been handled at the application layer, but some are quite massive (especially genomic data). DuckDB bundles with the CLI and travels around any platform, is super lightweight, allows for easily running in CI, on a user’s machine, against datasets of all sizes, and so on.

There are other embeddable options out there but I found DuckDb fit better for the potentially massive datasets, and also because of how naturally it ingests the types of data we work with, some of its unique features, and how trivial it was to learn and integrate with the project.

Otherwise I use it almost daily for doing guardrailed data exploration with LLMs. I prefer SQL over random DSLs in AWS or Sentry or what have you. I’ll ingest the data I need and just run SQL against it. I mentioned in another comment that I’ll tend to store more useful data (especially data I export routinely, like infra cost reports) on S3 and use a Rill instance to do basic exploration in a GUI (it will query remote parquet files).