Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by zzzoom

18 hours ago

Prefill (GEMM) is compute bound, decode (GEMV) is memory bound.

1 comment

zzzoom

Reply

Const-me  12 hours ago

> decode (GEMV) is memory bound

Decode with batch size 1 is GEMV. Batching makes the decode GEMM too.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities