Comment by pixelpoet
9 hours ago
GPUs aren't fast because they run standard CPU code with magic pixie dust, they're fast because they're specialised vector processors running specialised vector code.
Cuda can also do C++ new, delete and virtual functions and exception handling and all the rest. And if you use that stuff, you're basically making an aeroplane flap its wings, with all the performance implications that come with such an abomination.
inb4 these guys start running Python and Ruby on GPU for "speed", and while they're at it, they should send a fax to Intel and AMD saying "Hey guys, why do you keep forgetting to put the magic go-fast pixie dust into your CPUs, are you stupid?"
Nope, the magic pixie dust language that was supposed to run Python-like code on GPU was Mojo /s
this is really just leveraging Rust's existing, unique fit across HPC/numerics, embedded programming, low-level systems programming and even old retro-computing targets, and trying to expand that fit to the GPU by leveraging broad characteristics that are quite unique to Rust and are absolutely relevant among most/all of those areas.
The real GPU pixie dust is called "lots of slow but efficient compute units", "barrel processing", "VRAM/HBM" and "non-flat address space(s) with explicit local memories". And of course "wide SIMD+SPMD[0]" which is the part you already mentioned and is in fact somewhat harder to target other than in special cases (though neural inference absolutely relies on it!). But never mind that. A lot of existing CPU code that's currently bottlenecked on memory access throughput will absolutely benefit from being seamlessly ran on GPU.
[0] SPMD is the proper established name for what people casually call SIMT