Comment by kylejrp
4 years ago
As part of the Stack Overflow April Fools' prank, we did some data analysis on copy behavior on the site [0]. The most copied answer during the collection period (~1 month) was "How to iterate over rows in a DataFrame in Pandas" [1], receiving 11k copies!
[0] https://stackoverflow.blog/2021/04/19/how-often-do-people-ac...
That’s sad, as when you find yourself iterating over rows in pandas you’re almost invariably doing some wrong or very very sub optimally.
To me it's an means to an end. I don't care if my solution takes 100ms instead of 1ms, it's the superior choice for me if it takes me 1 minute to do it instead of 10 minutes to learn something new.
True, but sometimes these 10 minutes help you to discover something new that will improve your code.
I had a few of these cases in my life:
- discovering optimized patterns in Perl, which led to code I could not understand the next day
- discovering decorators in Python, which led to better code
- discovering comprehensions in Python (a magical thing) that led to better code, except when I wanted to be too clever and ended up with Perl-like code
The flaw of human nature on display. And I don't mean that personal, not to you anyway, but to the human species.
7 replies →
I iterate over rows in pandas fairly often for plotting purposes. Anytime I want to draw something more complicated than a single point for each row, I find it's simple and straight-forward to just iterrows() and call the appropriate matplotlib functions for each. It does mean some plots that are conceptually pretty simple end up taking ~5 seconds to draw, but I don't mind. Is there really a better alternative that isn't super complicated? Keep in mind that I frequently change my mind about what I'm plotting, so simple code is really good (it's usually easier to modify) even if it's a little slower.
>That’s sad, as when you find yourself iterating over rows in pandas you’re almost invariably doing some wrong or very very sub optimally.
Humans writing code is suboptimal. I can't wait for the day when robots/AI do it for us. I just hope it leads to a utopia and not a dystopia.
I'm glad that DataFrames don't iterate by default. It's good design to make suboptimal features hard to access.
I got bitten by that prank when copying code from a question, to see what it did (it was something obviously harmless). I was rather annoyed for about two seconds before I realized what date it was. :)