Comment by hwspeed

1 month ago

Classic case of optimizing the wrong thing. I've hit similar issues with ML training pipelines where GPU utilization looks terrible because data loading is the bottleneck. The profiler tells you the GPU kernel is fast, but doesn't show you it's sitting idle 80% of the time waiting for the next batch. Amdahl's law is brutal when you've got a serial component in your pipeline.

0 comments