Comment by benreesman

2 days ago

TLDR: Broken builds are the default in everything, only exceptional effort and resources get you anything else, in Python, the people with those resources have unclear incentives for anything to improve.

I think it's a combination of historical factors and contemporary misaligned incentives both in the small and the large. There are also some technical reasons why Python is sort of an "attractive nuisance" for really problematic builds.

The easy one that shouldn't be too controversial is that it has a massive C/C++ (and increasingly Rust) native code library ecosystem. That's hard to do under the best of circumstances, but it's especially tough in Python (paradoxically because Python is so good at this: when wrapping the fast library that's proven is really easy you do it all the time). In the absence of really organized central planning and real "SAT Solver Class" package managers (like `uv`, not like `pip`), a mess is more or less just nature taking it's course. That's kinda how we got here (or how we got to 2016 maybe).

But lots of language ecosystems predate serious solvers and other modern versioning, why is Python such a conspicuous mess on this in 2025? How can friggin Javascript have it together about 100x better?

That's where the bad incentives kick in. In the small, there is a lingering prestige attached to "AI Researcher" that makes about zero sense in a world where we're tweaking the same dozen architectures and the whole game is scaling it, but that's the way the history went. So people who need it to work once to write a paper and then move on? `pip freeze` baby, works on my box. Docker amplifies this "socialize the costs" thing because now you can `pip freeze` your clanky shit, spin it up on 10k Hopper cards, and move on. So the highest paid, most regarded, most clout-having people don't directly experience the pain, it's an abstraction to them.

In the large? If this shit worked then (hopefully useful oversimplification alert) FLOPs would be FLOPs. The LAPACK primitives and even more modern GEMM instructions can be spelled some fast way on pretty much any vendor's stuff. NVIDIA is usually ahead a word-shrink or two, but ROCm in principle supports training at FP8 and on CDNA (expensive) cards it does, on RNDA (cheap) cards, it says it does on the label but crashes under load so you can't use it if your time is worth anything.

The big labs and FAANGs are the kind of dark horse here. In principle you'd assume Meta would want all their Torch code to run on AMD, but their incentives are complicated, they do a lot of really dumb shit that's presumably good for influential executives because it's bad for shareholders. It's also possible that they've just lost the ability to do that level of engineering, it's hard and can't be solved by numbers or money alone.

0 comments

benreesman

No comments yet

Contribute on Hacker News ↗