Comment by thundergolfer
1 day ago
A lot of people here are commenting that if you have to care about specific latency numbers in Python you should just use another language.
I disagree. A lot of important and large codebases were grown and maintained in Python (Instagram, Dropbox, OpenAI) and it's damn useful to know how to reason your way out of a Python performance problem when you inevitably hit one without dropping out into another language, which is going to be far more complex.
Python is a very useful tool, and knowing these numbers just makes you better at using the tool. The author is a Python Software Foundation Fellow. They're great at using the tool.
In the common case, a performance problem in Python is not the result of hitting the limit of the language but the result of sloppy un-performant code, for example unnecessarily calling a function O(10_000) times in a hot loop.
I wrote up a more focused "Python latency numbers you should know" as a quiz here https://thundergolfer.com/computers-are-fast
I do performance optimization for a system written in Python. Most of these numbers are useless to me, because they’re completely irrelevant until they become a problem, then I measure them myself. If you are writing your code trying to save on method calls, you’re not getting any benefit from using the language and probably should pick something else.
It's always a balance.
Good designs do not happen in a vacuum but informed with knowledge of at least the outlines of the environment.
One can have a breakfast pursuing an idea -- let me spill some sticky milk on the dining table, who cares, I will clean up if it becomes a problem later.
Another is, it's not much of an overbearing constraint not to make a mess with spilt milk in the first place, maybe it will not be a big bother later, but it's not hurting me much now, to be not be sloppy, so let me be a little hygienic.
There's a balance between making a mess and cleaning up and not making a mess in the first place. The other extreme is to be so defensive about the possibility of creating a mess that it paralyses progress.
The sweet spot is somewhere between the extremes and having the ball-park numbers in the back of one's mind helps with that. It informs about the environment.
No.
Python’s issue is that it is incredibly slow in use cases that surprise average developers. It is incredibly slow at very basic stuff, like calling a function or accessing a dictionary.
If Python didn’t have such an enormous number of popular C and C++ based libraries it would not be here. It was saved by Numpy etc etc.
I'm not sure how Python can be described as "saved" by numpy et al., when the numerical Python ecosystem was there near the beginning, and the language and ecosystem have co-evolved? Why didn't Perl (with PDL), R or Ruby (or even php) succeed in the same way?
22ns for a function call and dictionary key lookup, that's actually surprisingly fast.
i hate python but if your bottleneck is that sqlite query, optimizing a handful of addition operations is a wash. thats why you need to at least have a feel for these tables
Agreed, and on top of that:
I think these kind of numbers are everywhere and not just specific to Python.
In zig, I sometimes take a brief look to the amount of cpu cycles of various operations to avoid the amount of cache misses. While I need to aware of the alignment and the size of the data type to debloat a data structure. If their logic applies, too bad, I should quit programming since all languages have their own latency on certain operations we should aware of.
There are reasons to not use Python, but that particular reason is not the one.
our build system is written in python, and i’d like it not to suck but still stay in python, so these numbers very much matter.
For some of these, there are alternative modules you can use, so it is important to know this. But if it really matters, I would think you'd know this already?
For me, it will help with selecting what language is best for a task. I think it won't change my view that python is an excellent language to prototype in though.
> ... a function O(10_000) times in a hot loop
O(10_000) is a really weird notation.
Generously we could say they probably mean ~10_000 rather than O(10_000)
I think both points are fair. Python is slow - you should avoid it if speed is critical, but sometimes you can’t easily avoid it.
I think the list itself is super long winded and not very informative. A lot of operations take about the same amount of time. Does it matter that adding two ints is very slightly slower than adding two floats? (If you even believe this is true, which I don’t.) No. A better summary would say “all of these things take about the same amount of time: simple math, function calls, etc. these things are much slower: IO.” And in that form the summary is pretty obvious.
I think the list itself is super long winded and not very informative.
I agree. I have to complement the author for the effort put in. However it misses the point of the original Latency numbers every programmer should know, which is to build an intuition for making good ballpark estimations of the latency of operations and that e.g. A is two orders of magnitude more expensive than B.
> A lot of important and large codebases were grown and maintained in Python
How does this happen? Is it just inertia that cause people to write large systems in a essentially type free, interpreted scripting language?
Small startups end up writing code in whatever gets things working faster, because having too large a codebase with too much load is a champagne problem.
If I told you that we were going to be running a very large payments system, with customers from startups to Amazon, you'd not write it in ruby and put the data in MongoDB, and then using its oplog as a queue... but that's what Stripe looked like. They even hired a compiler team to add type checking to the language, as that made far more sense than porting a giant monorepo to something else.
It's very simple. Large systems start as small systems.
Large systems are often aggregates of small systems, too.
It’s a nice and productive language. Why is that incomprehensible?
Python has types, now even gradual static typing if you want to go further. It's irrelevant whether language is interpreted scripting if it solves your problem.
It’s very natural. Python is fantastic for going from 0 to 1 because it’s easy and forgiving. So lots of projects start with it. Especially anything ML focused. And it’s much harder to change tools once a project is underway.
this is absolutely true, but there's an additional nuance: yes, python is fantastic, yes, it's easy and forgiving, but there are other languages like that too. ...except there really aren't. other than ruby and maybe go, every other popular language sacrifices ease of use for things that simply do not matter for the overwhelming majority of programs. much of python's popularity doesn't come from being easy and forgiving, it's that everything else isn't. for normal programming why would we subject ourselves to anything but python unless we had no choice?
while I'm on the soapbox I'll give java a special mention: a couple years ago I'd have said java was easy even though it's tedious and annoying, but I've become reacquainted with it for a high school program (python wouldn't work for what they're doing and the school's comp sci class already uses java.)
this year we're switching to c++.
5 replies →
Most large things begin life as small things.
Someone says "let's write a prototype in Python" and someone else says "are you sure we shouldn't use a a better language that is just as productive but isn't going to lock us into abysmal performance down the line?" but everyone else says "nah we don't need to worry about performance yet, and anyway it's just a prototype - we'll write a proper version when we need to"...
10 years later "ok it's too slow; our options are a) spend $10m more on servers, b) spend $5m writing a faster Python runtime before giving up later because nobody uses it, c) spend 2 years rewriting it and probably failing, during which time we can make no new features. a) it is then."
What many startups need to succeed is to be able to pivot/develop/repeat very quickly to find a product+market that makes money. If they don't find that, and most don't, the millions you talk about never come due. They also rarely have enough developers, so developer productivity in the short term is vital to that iteration speed. If that startup turns into Dropbox or Instagram, the millions you mention are round-off error on many billions. Easy business decision, and startups are first and foremost businesses.
Some startups end up in between the two extremes above. I was at one of the Python-based ones that ended up in the middle. At $30M in annual revenue, Python was handling 100M unique monthly visitors on 15 cheap, circa-2010 servers. By the time we hit $1B in annual revenue, we had Spark for both heavy batch computation and streaming computation tasks, and Java for heavy online computational workloads (e.g., online ML inference). There were little bits of Scala, Clojure, Haskell, C++, and Rust here and there (with well over 1K developers, things creep in over the years). 90% of the company's code was still in Python and it worked well. Of course there were pain points, but there always are. At $1B in annual revenue, there was budget for investments to make things better (cleaning up architectural choices that hadn't kept up, adding static types to core things, scaling up tooling around package management and CI, etc.).
But a key to all this... the product that got to $30M (and eventually $1B+) looked nothing like what was pitched to initial investors. It was unlikely that enough things could have been tried to land on the thing that worked without excellent developer productivity early on. Engineering decisions are not only about technical concerns, they are also about the business itself.
What language is “just as productive but isn't going to lock us into abysmal performance down the line”?
What makes that language not strictly superior to Python?
2 replies →
I don't know a better open source language than Python. Java and C# are both better (platforms) but they come with that obvious corporate catch.
If I made an app in python and in 10 years it grows so successful that it needs a $10m vertical scale or $5m rewrite, I wouldn't even complain.