Comment by gunalx
9 hours ago
They probably use it on all models. Fast is probably just a resource pool with less congestion and therefore faster throughput per user but less efficent.
9 hours ago
They probably use it on all models. Fast is probably just a resource pool with less congestion and therefore faster throughput per user but less efficent.
If it speeds prefill too I guess so.