Comment by codedokode
11 hours ago
I think it is possible to run CPU code on GPU (including the whole OS), because GPU has registers, memory, arithmetic and branch instructions, and that should be enough. However, it will be able to use only several cores from many thousands because GPU cores are effectively wide SIMD cores, grouped into the clusters, and CPU-style code would use only single SIMD lane. Am I wrong?
This seems correct to me. Of course you'd need to build a CPU emulator to run CPU code. A single GPU core is apparently about 100x slower than a single CPU core. With emulation a 1000x slowdown might be expected. So with a lot of handwaving, expect performance similar to a 4 MHz processor.
Obviously code designed for a GPU is much faster. You could probably build a reasonable OS that runs on the GPU.
Given enough time, we'll all loop back around to the Xeon Phi: https://en.wikipedia.org/wiki/Xeon_Phi
It was ahead of its time!
When I was in grad school I tried getting my hands on a phi, it seemed impossible.
Xeon Phi was so cool. I wanted to use the ones we had so much... but couldn't find any applications that would benefit enough to make it worth the effort. I guess that's why it died lol.
GPUs having have thousands of cores is just a silly marketing newspeak.
They rebranded SIMD lanes "cores". For eaxmple Nvidia 5000 series GPUs have 50-170 SMs which are the equivalent of cpu cores there. So a more than desktops, less than bigger server CPUs. By this math each avx-512 cpu core has 16-64 "gpu cores".
170 compute units is still a crapload of em for a non-server platform with non-server platform requirements. so the broad "lots of cores" point is still true, just highly overstated as you said. plus those cores are running the equivalent of n-way SMT processing, which gives you an even higher crapload of logical threads. AND these logical threads can also access very wide SIMD when relevant, which even early Intel E-cores couldn't. All of that absolutely matters.
Each SM can typically schedule 4 warps so it’s more like 400 “cores” each with 1024-bit SIMD instructions. If you look at it this way, they clearly outclass CPU architectures.