Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by WithinReason

2 days ago

This is 11 bit ops and a subtract, which I assume is ~11 clocks, while you can just do:

l1 = dot(A[:11000000],B[:11000000]) l2 = dot(A[:00110000],B[:00110000]) l3 = dot(A[:00001100],B[:00001100]) l4 = dot(A[:00000011],B[:00000011])

result = l1 + l2 * 4 + l3 * 16 + l4 * 64

which is 8 bit ops and 4x8 bit dots, which is likely 8 clocks with less serial dependence

0 comments

WithinReason

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities