Comment by melodyogonna
4 hours ago
> Why should they? CUDA is a GPGPU paradigm, AMD/Apple/Intel all ship diverse raster-focused hardware, and TPUs are a systolic array. How much can you realistically expect to abstract with unified primitives?
Ah, it seems impossible to you. These are very different hardwares... It is hard enough to make compatibility among different hardwares of the same vendor. Very difficult to imagine building primitives for hardwares with completely different memory layouts.
> How much performance do you perceive to be left on the table with native CUDA-based implimentations?
Zero is the idea. And I wasn't saying there should be a native cuda-based implementation, I'm asking you to imagine how much easier everything would have been if Cuda was cross-platform without any performance or ergonomic penalties.
Mojo is a foundational step here. The big HOW is powerful parametric programming. So much information could be passed during compile time which the compiler uses to specialize.
No comments yet
Contribute on Hacker News ↗