Comment by godelski

8 months ago

I haven't look at the post in the detail it deserves, but given your graphs the workload looks pretty bursty. I'd suspect there are some good I/O optimizations or some predication. Definitely that last void main block looks ripe for that. But I'd listen to Knuth, premature optimization and all, so grab a profiler. I wouldn't be surprised if you're nearing peak performance. Also NVIDIA GPUs have a lot of special tricks that can be exploited but are buried in documentation... if you haven't already seen it (I suspect you have), you'd be interested in "GPU Gems". Gems 2 has some good stuff on predication.

But also, really good work! You should be proud of this! Squeezing that much out of that hardware is no easy feat.