Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by teo_zero

12 hours ago

The author forgot to add "fused" here, like they did in other parts of the same section.

Non-fused:

  foreach i
    y[i] = cos(x[i])
  foreach i
    z[i] = cos(y[i])

Fused, no intermediate variable:

  foreach i
    t = cos(x[i])
    z[i] = cos(t)

The temporary "t" doesn't leave the GPU. Sweeping the array twice makes you twice as dependent on memory bandwidth.

0 comments

teo_zero

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities