Comment by progval
4 days ago
The pure Python code in the last example is more verbose than it needs to be.
groups = {}
for row in filtered:
key = (row['species'], row['island'])
if key not in groups:
groups[key] = []
groups[key].append(row['body_mass_g'])
can be rewritten as:
groups = collections.defaultdict(list)
for row in filtered:
groups[(row['species'], row['island'])].append(row['body_mass_g'])
and
variance = sum((x - mean) ** 2 for x in values) / (n - 1)
std_dev = math.sqrt(variance)
as:
std_dev = statistics.stddev(values)
> (n - 1)
It's also funny that one would write their own standard deviation function and include Bessel's correction. Usually if I'm manually re-implementing a standard deviation function it's because I'm afraid the implementors blindly applied the correction without considering whether or not it's actually meaningful for the given analysis. At the very least, the correct name for what's implemented there should really be `sample_std_dev`.
It is sadly really inconsistent. The stdlib statistics has two separate functions, stdev for sample and pstdev for population. Numpy and pandas both have .std() with ddof (delta degrees of freedom) as a parameter, but numpy defaults to 0 (population) and pandas to 1 (sample).
There's also itertools.groupby, maybe not much shorter (need to define the keyfunc, sort, then iterate), but it does make the intent obvious.
Disagree.
In the first instance, the original code is readable and tells me exactly what's what. In your example, you're sacrificing readability for being clever.
Clear code(even if verbose) is better than being clever.
Using a very common utility in the standard library is to avoid reinventing the wheel is not "clean code"?
defaultdict is ubiquitous in modern python, and is far from a complicated concept to grasp.
I don't think that's the right metaphor to use here, it exists at a different level than what I would consider "reinventing the wheel". That to me is more some attempt to make a novel outward-facing facet of the program when there's not much reason to do so. For example, reimplementing shared memory using a custom kernel driver as your IPC mechanism, despite it not doing anything that shared memory doesn't already do.
The difference between the examples is so trivial I'm not really sure why the parent comment felt compelled to complain.
Imo, if you read such code the first time, you may prefer the first. If you read it for the 20th time, you may prefer the second. Once you understand what you are doing, often one prefers more concise syntax that helps in handling complexity within a larger project. But it can seem a bit "too clever" in the beginning.
This happened to me with comprehensions in python, and with JS' love for anonymous/arrow functions.
Once you get used to a language's "quirks" (so long as they're considered idiomatic), they no longer feel quirky, and it's usually pretty quick.
2 replies →
I think code clarity is subjective. I find the second easier to read because I have to look at less code. When I read code, I instinctively take it apart and see how it fits together, so I have no problem with the second approach. Whereas the first approach is twice as long so it takes me roughly twice as long to read.
The 2nd version is the most idiomatic.
Interesting! Thanks for the responses. I'm not python native and haven't worked as extensively with python as some of you here.
That said, I'll change my mind here and agree on using std library, but I'd still have separate 'key' assignment here for more clarity.
I would keep the explicit key= assignment since it's more than just a single literal but otherwise the second version is more idiomatic and readable.