You're missing some context, it's not bitonic sort itself that would present an issue with GPUs, it's the "with immutable tree rotations" part, which in a naive implementation would imply some kind of memory management that would have trouble scaling to thousands of cores.
Yes and those benchmarks are real. Showing linear speed up in the number cores when writing standard code is a real achievement. If you assumed that somehow means this is a state of the art compiler with super blazing performance is on no one but you. The readme lays it out very clearly.
the irony in you blasting all over this thread is that you dont know how it even works. You have 0 idea if their claims of scaling linearly are causing bottlenecks in other places as you state, if you read actual docs on this its clear that the actaul "compiler" part of the compiler was put on the backburner while the parallellization was figured out and as that is now done a bunch of optimizations will come in the next year
I'll give you the benefit of the doubt in case README changed, but here's the benchmark it claims, currently, against it's own execution modes:
CPU, Apple M3 Max, 1 thread: 12.15 seconds
CPU, Apple M3 Max, 16 threads: 0.96 seconds
GPU, NVIDIA RTX 4090, 16k threads: 0.21 seconds
The README mentions "fast" in 2 places, none of which is comparing to other languages.
You're missing some context, it's not bitonic sort itself that would present an issue with GPUs, it's the "with immutable tree rotations" part, which in a naive implementation would imply some kind of memory management that would have trouble scaling to thousands of cores.
Yes and those benchmarks are real. Showing linear speed up in the number cores when writing standard code is a real achievement. If you assumed that somehow means this is a state of the art compiler with super blazing performance is on no one but you. The readme lays it out very clearly.
[flagged]
the irony in you blasting all over this thread is that you dont know how it even works. You have 0 idea if their claims of scaling linearly are causing bottlenecks in other places as you state, if you read actual docs on this its clear that the actaul "compiler" part of the compiler was put on the backburner while the parallellization was figured out and as that is now done a bunch of optimizations will come in the next year
1 reply →
If you want to bring up CS101 so badly, surely turing machines and lambda calculus would be more relevant.
The actually interesting claim is that somebody has found a practical use for Interaction Combinators as a computing model for putting GPUs to work.