Comment by postalcoder

25 days ago

I've migrated off of pandas to polars for my workflows to reap the benefit of, in my experience a 10-20x speedup on average. I can't imagine anything bringing me back short of a performance miracle. LLMs have made syntax almost a non-barrier.

29 comments

postalcoder

lvl155 25 days ago

Went from pandas to polars to duckdb. As mentioned elsewhere SQL is the most readable for me and LLM does most of the coding on my end (quant). So I need it at the most readable and rudimentary/step-wise level.

OT, but I can’t imagine data science being a job category for too long. It’s got to be one of the first to go in AI age especially since the market is so saturated with mediocre talents.

data-ottawa 24 days ago

As a long time DS I sadly feel we filled the field with people who don’t do any actual data science or engineering. A lot of it is glorified BI users who at most pull some averages and run half baked AB tests.
I don’t think the field will go away with AI, frankly with LLMs I’ve automated that bottom 80% of queries I used to have to do for other users and now I just focus on actual hard problems.
That “build a self serve dashboard” or number fetching is now an agentic tool I built.
But the real meat of “my business specializes in X, we need models to do this well” has not yet been replaceable. I think most hard DS work is internal so isn’t in training sets (yet).
claytonjy 24 days ago
Even before LLMs, Data Science was being replaced by more specialization, IME.
Data Engineers took over the plumbing once they moved on from Scala and Spark. ML Engineers took over the modeling (and LLMs are now killing this job too, as it’s rare to need model training outside of big labs). Data analysts have to know SQL and python these days, and most DS are now just this, but with a nicer title and higher pay.
Once upon a time I thought DS would be much more about deeper statistics and causal inference, but those have proven to be rare, niche needs outside soft science academia.
- datsci_est_2015 24 days ago
  
  Reading a comment like this makes me realize how broad the title “Data Scientist” is, especially this tidbit:
  > as it’s rare to need model training outside of big labs
  Do you think there are pre-trained models for e.g. process optimization for the primary metallurgy process for steel manufacturing? Industrial engineers don’t know anything about machine learning (by trade), and there are companies that bring specialized Data Science know-how to that industry to improve processes using modern data-driven methods, especially model building.
  It’s almost like 99% of comments on this topic think that DS begins at image classification and ends at LLMs, with maybe a little bit of landing page A/B testing or something. Wild.
  > Once upon a time I thought DS would be much more about deeper statistics and causal inference, but those have proven to be rare, niche needs outside soft science academia.
  This is my entire career lol.
datsci_est_2015 24 days ago

> It’s got to be one of the first to go in AI age especially since the market is so saturated with mediocre talents.
Depends what your definition of “to go” means. Responsibilities swallowed by peers? Sure, and new job titles might pop up like Research & Development Engineer or something.
The discipline of creating automated systems to extract insights from data to create business value? I can’t really see that going anywhere. I mean, why tf would we be building so many data centers if there’s no value in the data they’re storing.
iugtmkbdfil834 25 days ago
<< It’s got to be one of the first to go in AI age especially since the market is so saturated with mediocre talents.
This is interesting. I wanted to dig into it a little since I am not sure I am following the logic of that statement.
Do you mean that AI would take over the field, because by default most people there are already not producing anything that a simple 'talk to data' LLM won't deliver?
- mynameisash 24 days ago
  
  Not GP, but as a data engineer who has worked with data scientists for 20 years, I think the assessment is unfortunately true.
  I used to work on teams where DS would put a ton of time into building quality models, gating production with defensible metrics. Now, my DS counterparts are writing prompts and calling it a day. I'm not at all convinced that the results are better, but I guess if you don't spend time (=money) on the work, it's hard to argue with the ROI?
  
  3 replies →

mritchie712 25 days ago

also migrated, but to duckdb.

It's funny to look back at the tricks that were needed to get gpt3 and 3.5 to write SQL (e.g. "you are a data analyst looking at a SQL database with table [tables]"). It's almost effortless now.

wodenokoto 24 days ago

Do you use it from within Python or just ingest straight into duckdb.exe or duckdb UI?

howling 25 days ago

Same. I don't even use LLM normally as I found polars' syntax to be very intuitive. I just searched my ChatGPT history and the only times I used it are when I'm dealing with list and struct columns that were not in pandas.

postalcoder 25 days ago
iirc part of pandas’ popularity was that it modeled some of R’s ergonomics. What a time in history, when such things mattered! (To be clear, I’m not making fun of pandas. It was the bridge I crossed that moved me from living in Excel to living in code.)
- iugtmkbdfil834 25 days ago
  
  I learned about pandas with R in my class way back when. At the time, it seemed like magic. In a sense, it still does, but things evolve.

gHA5 25 days ago

Do you not experience LLM generated code constantly trying to use Pandas' methods/syntax for Polars objects?

edschofield 25 days ago
Yes, ChatGPT 5.2 Pro absolutely still does this. Just ask it for a pivot table using Polars and it will probably spit out code with Pandas arguments that doesn’t work.
postalcoder 25 days ago
There were some growing pains in gpt-3.5 to gpt-4 era, but not nowadays (shoutout to the now-defunct Phind, which was a game changer back then).
- crimsoneer 25 days ago
  
  The fact they pivoted away from their very compelling core offering (AI stack overflow) to complete with loveable etc in the "AI generated apps" giant fight continues to baffle me. Though I guess model updates ate their lunch.
  
  1 reply →

thibaut_barrere 25 days ago

Polars being so fast, and embeddable into other languages, has made it a no brainer for me to adopt it.

I have integrated Explorer https://github.com/elixir-explorer/explorer, which leverages it, into many Elixir apps, so happy to have this.

OutOfHere 25 days ago

The speedup you claim is going to be contingent on how you use Pandas, with which data types, and which version of Pandas.

alex7o 25 days ago

Same, also polars works on typescript which I used at some point out move my data from backend to frontend

thegabriele 25 days ago

" 10-20x speedup on average. "

Is this everyone's experience?

OGWhales 24 days ago

It depends on the specifics, but I converted a couple of scripts recently that would take minutes to run with Pandas that only took seconds to run with Polars. I was pretty impressed.
mynameisash 24 days ago

That was probably about what I got when I migrated some heavy number crunching code from Pandas to Polars a few years ago. Maybe even better than that.
mjhay 24 days ago

It’s a typical experience. Polars is fast, and Pandas is very slow and memory-hungry. It would be one thing if Pandas had a good API, but it doesn’t.