Comment by kllrnohj
3 hours ago
> Why is the NN-only portion almost as fast on an iPhone 17 compared to a V100 when the V100 has 4x the FP throughput?
Might have some sequential section or a block size that struggles to fill a V100 or a large chunk of CPU-only work or any number of things like that.
No comments yet
Contribute on Hacker News ↗