Comment by godelski
8 months ago
I haven't look at the post in the detail it deserves, but given your graphs the workload looks pretty bursty. I'd suspect there are some good I/O optimizations or some predication. Definitely that last void main block looks ripe for that. But I'd listen to Knuth, premature optimization and all, so grab a profiler. I wouldn't be surprised if you're nearing peak performance. Also NVIDIA GPUs have a lot of special tricks that can be exploited but are buried in documentation... if you haven't already seen it (I suspect you have), you'd be interested in "GPU Gems". Gems 2 has some good stuff on predication.
But also, really good work! You should be proud of this! Squeezing that much out of that hardware is no easy feat.
No comments yet
Contribute on Hacker News ↗