Okay, no. I know I called out performance in my post, but that was just from my observations. It surprised me to see something be that much slower than pure python. If you show me a near-python code example in a new language, as someone who mostly writes python code, I'm going to go and write it in python and see how it compares performance wise.
The authors never made any kind of false claims at all. You're reading a lot in to both their README and my post.
They've updated the README for a bit of clarity, but even re-reading the README as it was when I looked this morning (and even a few from before) it hasn't claimed to be fast. The claims are all related to the features that it does have, around parallelisation.
You're missing some context, it's not bitonic sort itself that would present an issue with GPUs, it's the "with immutable tree rotations" part, which in a naive implementation would imply some kind of memory management that would have trouble scaling to thousands of cores.
Yes and those benchmarks are real. Showing linear speed up in the number cores when writing standard code is a real achievement. If you assumed that somehow means this is a state of the art compiler with super blazing performance is on no one but you. The readme lays it out very clearly.
Okay, no. I know I called out performance in my post, but that was just from my observations. It surprised me to see something be that much slower than pure python. If you show me a near-python code example in a new language, as someone who mostly writes python code, I'm going to go and write it in python and see how it compares performance wise.
The authors never made any kind of false claims at all. You're reading a lot in to both their README and my post.
They've updated the README for a bit of clarity, but even re-reading the README as it was when I looked this morning (and even a few from before) it hasn't claimed to be fast. The claims are all related to the features that it does have, around parallelisation.
[flagged]
Where did he claim it is fast? As far as I can see the only claim is that it scales linearly with cores. Which it actually seems to do.
[flagged]
I'll give you the benefit of the doubt in case README changed, but here's the benchmark it claims, currently, against it's own execution modes:
CPU, Apple M3 Max, 1 thread: 12.15 seconds
CPU, Apple M3 Max, 16 threads: 0.96 seconds
GPU, NVIDIA RTX 4090, 16k threads: 0.21 seconds
The README mentions "fast" in 2 places, none of which is comparing to other languages.
You're missing some context, it's not bitonic sort itself that would present an issue with GPUs, it's the "with immutable tree rotations" part, which in a naive implementation would imply some kind of memory management that would have trouble scaling to thousands of cores.
Yes and those benchmarks are real. Showing linear speed up in the number cores when writing standard code is a real achievement. If you assumed that somehow means this is a state of the art compiler with super blazing performance is on no one but you. The readme lays it out very clearly.
4 replies →