Comment by vegabook

5 hours ago

"revolutionary"? It just copied and pasted the decades-old R (previous "S") dataframe into Python, including all the paradigms (with worse ergonomics since it's not baked into the language).

No other modern language will compete with R on ergonomics because of how it allows functions to read the context they’re called in, and S expressions are incredibly flexibly. The R manual is great.

To say pandas just copied it but worse is overly dismissive. The core of pandas has always been indexing/reindexing, split-apply-combine, and slicing views.

It’s a different approach than R’s data tables or frames.

  • > allows functions to read the context they’re called in

    Can you show an example? Seems interesting considering that code knowing about external context is not generally a good pattern when it comes to maintainability (security, readability).

    I’ve lived through some horrific 10M line coldfusion codebases that embraced this paradigm to death - they were a whole other extreme where you could _write_ variables in the scope of where you were called from!

    • Say I have a dataframe called 'penguins'

      I can write code like: penguin_sizes <- select(penguins, weight, height)

      Here, weight and height are columns inside the dataframe. But I can refer to them as if they were objects in the environment (I., e without quotes) because the select function looks for them inside the penguins dataframe (it's first argument)

      This is a very simple example but it's used extensively in some R paradigms

This is an interesting question.

Dataframes first appeared in S-PLUS in 1991-1992. Then R copied S, and from 1995-1996-1997 onwards R started to grow in popularity in statistics. As free and open source software, R started to take over the market among statisticians and other people who were using other statistical software, mainly SAS, SPSS and Stata.

Given that S and R existed, why were they mostly not picked up by data analysts and programmers in 1995-2008, and only Python and Pandas made dataframes popular from 2008 onwards?

Exactly. I was programming in R in 2004 and Pandas didnt exist. I remember trying Pandas once and it felt unergonomic for fata analysis and it lacked the vast library of statistical analysis library.

It was revolutionary to Python. Without NumPy and Pandas, ML in Python would never have been a thing.

(Yes, yes - I know some people wish that were the case!)