← Back to context

Comment by oreilles

2 days ago

It is superior because you don't need to assign your dataframe to a variable ('df'), then update that variable or create a new one everytime you need to do that operation. Which means it is both safer (you're guaranteed to filter on the current version of the dataframe) and more concise.

For the rest of your comment: it's the best you can do in python. Sure you could write SQL, but then you're mixing text queries with python data manipulation and I would dread that. And SQL-only scripting is really out of question.

Eh, SQL and python can still work together very well where SQL takes the place of pandas. Doing things in waves/batch helps.

Big problem with pandas is that you still have to load the dataframe into memory to work with it. My data's too big for that and postgres makes that problem go away almost entirely.