← Back to context

Comment by tosh

1 day ago

A100: 312 TFLOP/s for FP16

but it is very impressive how far modern CPUs get as well (also in smart phones!)

3 comments

tosh

Reply

p1esk 1 day ago

Intel Xeon 6980P: 128 cores x 1024 FP16 FLOP/cycle/core x 3.2 GHz: 419 TFLOP/s

tosh 1 day ago
I'm not saying "GPU more brrt than CPU"
I found the comparison interesting
on Intel Xeon 690P with 419 TFLOP/s it is still (maybe even more?) interesting to ask:
how much throughput can you reach with Python, Python with lib x, y, z, with C++ like this, with C++ like that etc etc and why?
no?
- p1esk 1 day ago
  
  No one in their right mind would use pure Python to do matrix multiplication. It’s like using a screwdriver to hammer nails into wood.
  But this discussion is even more bizarre than comparing a screwdriver to a hammer, it’s like comparing a screwdriver to a nail.