Comment by briHass

1 month ago

More importantly, you have your data in a structured format that can be easily inspected at any stage of the pipeline using a familiar tool: SQL.

I've been using this pattern (scripts or code that execute commands against DuckDB) to process data more recently, and the ability to do deep investigations on the data as you're designing the pipeline (or when things go wrong) is very useful. Doing it with a code-based solution (read data into objects in memory) is much more challenging to view the data. Debugging tools to inspect the objects on the heap is painful compared to being able to JOIN/WHERE/GROUP BY your data.

1 comment

briHass

groundzeros2015 1 month ago

Yep. It’s literally what SQL was designed for, your business website can running it… the you write a shell script to also pull some data on a cron. It’s beautiful