Comment by ironbound
12 hours ago
The Deepseek v3 paper details a quantisation method of scaling after matmul but before accumulation to improve precision, this is different than normal GEMM as operations are left till the end, can read more in chapter 3.3 of the paper below.
No comments yet
Contribute on Hacker News ↗