← Back to context

Comment by theodpHN

1 year ago

Yes, it's very convenient to be able to use SQL with your massively parallel commercial database (Oracle, Snowflake, etc.) and then again with the results sets (Pandas, etc.). Interestingly, it's a concept that was implemented 35 years ago in SAS (link below) but is just now gaining traction in today's "modern" software (e.g., via DuckDB).

USING THE NEW SQL PROCEDURE IN SAS PROGRAMS (1989) https://support.sas.com/resources/papers/proceedings-archive... The Sql procedure uses SQL to create, modify, and retrieve data from SAS data sets and views derived from those data sets. You can also use the SOL procedure to join data sets and views with those from other database management systems through the SAS/ACCESS software interfaces.

Wow, that is really cool. One of my theses is that DuckDB will be bought by GCP (BigQuery), and polars will be bought by Databricks (or AWS). The thesis is based on the idea that Snowflake bought the Modin platform. The movement in DE seems to be towards data warehouse platforms streaming data (views/results) down to dataframe (Modin, Polars, DuckDB) platforms, which then stream down to their BI platforms. Because these database platforms are designed as OLAP platforms so, this approach makes sense.