Comment by jauntywundrkind
1 day ago
Google's work on Jax, pytorch, tensorflow, and the more general XLA underneath are exactly the kind of anti-moat everyone has been clamoring for.
1 day ago
Google's work on Jax, pytorch, tensorflow, and the more general XLA underneath are exactly the kind of anti-moat everyone has been clamoring for.
Anti-moat like commoditizing the compliment?
If they get things like PyTorch to work well without carinng what hardware it is running on, it erodes Nvidia's CUDA moat. Nvidia's chips are excellent, without doubt, but their real moat is the ecosystem around CUDA.
The problem is that "hardware-agnostic PyTorch" is a myth, much like Java's "write once, run anywhere". At the high level (API), the code looks the same, but as soon as you start optimizing for performance, you inevitably drop down to CUDA. As long as researchers are writing their new algorithms in CUDA because it's the de facto language of science, Google will forever be playing catch-up, having to port these algorithms to XLA. An ecosystem is, after all, people and their habits, not just libraries.
I'd love for someone to give me an alternative to CUDA but I don't primarily use GPUs for inference, I do 64-bit unsigned integer workloads and the only people who seem to care even a little about this currently are NVidia, if imperfectly.
I _really_ want an alternative but the architecture churn imposed by targeting ROCm for say an MI350X is brutal. The way their wavefronts and everything work is significantly different enough that if you're trying to get last-mile perf (which for GPUs unfortunately yawns back into the 2-5x stretch) you're eating a lot of pain to get the same cost-efficiency out of AMD hardware.
FPGAs aren't really any more cost effective unless the $/kwh goes into the stratosphere which is a hypothetical I don't care to contemplate.
PyTorch is only part of it. There is still a huge amount of CUDA that isn’t just wrapped by PyTorch and isn’t easily portable.
3 replies →
Yes!
Pytorch, Jax, tensorflow are all examples to me of very capable products, that compete very well in ML space.
But more broadly work like XLA and IREE are very interesting toolkits for mapping a huge variety of computation onto many types of hardware. While Pytorch et al are fine example applications, are things you can do, XLA is the Big Tent idea, the toolkit to erode not just specific CUDA use cases, but to allow hardware in general to be more broadly useful.
*complement