Comment by skeeter2020

4 hours ago

>> DuckDB is amazing for any sort of fast data analysis when the data is small enough that it can fit on your laptop

I agree, and the dirty (not so) secret big data providers like Snowflake try to hide: the majority of your work is not big data and WILL fit on your local machine. My last company was spending $2M/yr on contract with Snowflake, and another million between Fivetran and Matillion. Of the 1200 clients using analytics maybe 2 had enough data to warrant "infinite scalability" and a dozen wanted Snowflake because they already had corporate warehouses in Snowflake (they probably didn't need it either). Turns out the Extract and Load could be handled by bog-standard C# code and a bunch of SQL, while almost everyone was better off with a DuckDB database running locally, often in the browser. You've probably heard YAGNI before (You Ain't Gonna Need It) but it's even more likely with "Big Data". #SmallDataConvert

Folks have been beating this drum for as long as I've worked in software, dating to the Hadoop era, and it remains true today. So much of "big data" only appears big because it's poorly stored, or is represented wastefully (in persistent storage or in memory).