Comment by saagarjha
2 days ago
They didn’t. They used PTX, which is what CUDA C++ compiles down to, but which is part of the CUDA toolchain. All major players have needed to do this because the intrinsics for the latest accelerators are not actually exposed in the C++ API, which means using them requires inline PTX at the very minimum.
No comments yet
Contribute on Hacker News ↗