Comment by jtbaker

3 hours ago

It's already greatly improved over previous generations due to M5s having tensor cores (higher compute capacity for matmul operations, the bottleneck for prefill).

0 comments