Comment by teo_zero
13 hours ago
The author forgot to add "fused" here, like they did in other parts of the same section.
Non-fused:
foreach i
y[i] = cos(x[i])
foreach i
z[i] = cos(y[i])
Fused, no intermediate variable:
foreach i
t = cos(x[i])
z[i] = cos(t)
The temporary "t" doesn't leave the GPU. Sweeping the array twice makes you twice as dependent on memory bandwidth.
No comments yet
Contribute on Hacker News ↗