Comment by postalcoder
6 hours ago
I've migrated off of pandas to polars for my workflows to reap the benefit of, in my experience a 10-20x speedup on average. I can't imagine anything bringing me back short of a performance miracle. LLMs have made syntax almost a non-barrier.
Went from pandas to polars to duckdb. As mentioned elsewhere SQL is the most readable for me and LLM does most of the coding on my end (quant). So I need it at the most readable and rudimentary/step-wise level.
OT, but I can’t imagine data science being a job category for too long. It’s got to be one of the first to go in AI age especially since the market is so saturated with mediocre talents.
As a long time DS I sadly feel we filled the field with people who don’t do any actual data science or engineering. A lot of it is glorified BI users who at most pull some averages and run half baked AB tests.
I don’t think the field will go away with AI, frankly with LLMs I’ve automated that bottom 80% of queries I used to have to do for other users and now I just focus on actual hard problems.
That “build a self serve dashboard” or number fetching is now an agentic tool I built.
But the real meat of “my business specializes in X, we need models to do this well” has not yet been replaceable. I think most hard DS work is internal so isn’t in training sets (yet).
<< It’s got to be one of the first to go in AI age especially since the market is so saturated with mediocre talents.
This is interesting. I wanted to dig into it a little since I am not sure I am following the logic of that statement.
Do you mean that AI would take over the field, because by default most people there are already not producing anything that a simple 'talk to data' LLM won't deliver?
Not GP, but as a data engineer who has worked with data scientists for 20 years, I think the assessment is unfortunately true.
I used to work on teams where DS would put a ton of time into building quality models, gating production with defensible metrics. Now, my DS counterparts are writing prompts and calling it a day. I'm not at all convinced that the results are better, but I guess if you don't spend time (=money) on the work, it's hard to argue with the ROI?
also migrated, but to duckdb.
It's funny to look back at the tricks that were needed to get gpt3 and 3.5 to write SQL (e.g. "you are a data analyst looking at a SQL database with table [tables]"). It's almost effortless now.
Do you use it from within Python or just ingest straight into duckdb.exe or duckdb UI?
Same. I don't even use LLM normally as I found polars' syntax to be very intuitive. I just searched my ChatGPT history and the only times I used it are when I'm dealing with list and struct columns that were not in pandas.
iirc part of pandas’ popularity was that it modeled some of R’s ergonomics. What a time in history, when such things mattered! (To be clear, I’m not making fun of pandas. It was the bridge I crossed that moved me from living in Excel to living in code.)
I learned about pandas with R in my class way back when. At the time, it seemed like magic. In a sense, it still does, but things evolve.
Polars being so fast, and embeddable into other languages, has made it a no brainer for me to adopt it.
I have integrated Explorer https://github.com/elixir-explorer/explorer, which leverages it, into many Elixir apps, so happy to have this.
Do you not experience LLM generated code constantly trying to use Pandas' methods/syntax for Polars objects?
Yes, ChatGPT 5.2 Pro absolutely still does this. Just ask it for a pivot table using Polars and it will probably spit out code with Pandas arguments that doesn’t work.
There were some growing pains in gpt-3.5 to gpt-4 era, but not nowadays (shoutout to the now-defunct Phind, which was a game changer back then).
The fact they pivoted away from their very compelling core offering (AI stack overflow) to complete with loveable etc in the "AI generated apps" giant fight continues to baffle me. Though I guess model updates ate their lunch.
1 reply →
" 10-20x speedup on average. "
Is this everyone's experience?
It depends on the specifics, but I converted a couple of scripts recently that would take minutes to run with Pandas that only took seconds to run with Polars. I was pretty impressed.
That was probably about what I got when I migrated some heavy number crunching code from Pandas to Polars a few years ago. Maybe even better than that.
Same, also polars works on typescript which I used at some point out move my data from backend to frontend
The speedup you claim is going to be contingent on how you use Pandas, with which data types, and which version of Pandas.