← Back to context Comment by zzzoom 18 hours ago Prefill (GEMM) is compute bound, decode (GEMV) is memory bound. 1 comment zzzoom Reply Const-me 12 hours ago > decode (GEMV) is memory boundDecode with batch size 1 is GEMV. Batching makes the decode GEMM too.
Const-me 12 hours ago > decode (GEMV) is memory boundDecode with batch size 1 is GEMV. Batching makes the decode GEMM too.
> decode (GEMV) is memory bound
Decode with batch size 1 is GEMV. Batching makes the decode GEMM too.