Comment by rowanG077

8 months ago

Where did he claim it is fast? As far as I can see the only claim is that it scales linearly with cores. Which it actually seems to do.

[flagged]

  • I'll give you the benefit of the doubt in case README changed, but here's the benchmark it claims, currently, against it's own execution modes:

    CPU, Apple M3 Max, 1 thread: 12.15 seconds

    CPU, Apple M3 Max, 16 threads: 0.96 seconds

    GPU, NVIDIA RTX 4090, 16k threads: 0.21 seconds

    The README mentions "fast" in 2 places, none of which is comparing to other languages.

  • You're missing some context, it's not bitonic sort itself that would present an issue with GPUs, it's the "with immutable tree rotations" part, which in a naive implementation would imply some kind of memory management that would have trouble scaling to thousands of cores.

  • Yes and those benchmarks are real. Showing linear speed up in the number cores when writing standard code is a real achievement. If you assumed that somehow means this is a state of the art compiler with super blazing performance is on no one but you. The readme lays it out very clearly.