Comment by spmurrayzzz

7 hours ago

I'm not entirely up to date with the latest batch, but I've reviewed some of the rollouts in the past and my sense is that the models are surprisingly good at getting correct custom kernels in the happy path, but still weak at sustained/shape-robust workloads. Having to deal with writing the full path from scratch compounded by weird memory layouts, odd sizes, routing, unpacking quantized weights, etc. is definitely challenging.

Also, at least a portion of this you could argue is arbitrary and entirely scoped to the eval itself. The fp8 GEMM score could be low simply because one of the shapes is fairly skinny (i.e. not enough math work to keep the compute engine busy for a meaningful amount of time).