Comment by nabakin
2 days ago
> On throughput-focused benchmarks, Tokasaurus can outperform vLLM and SGLang by up to 3x+.
Looks like they don't compare to TensorRT-LLM throughput numbers which, last I checked, are SOTA in open source.
2 days ago
> On throughput-focused benchmarks, Tokasaurus can outperform vLLM and SGLang by up to 3x+.
Looks like they don't compare to TensorRT-LLM throughput numbers which, last I checked, are SOTA in open source.
TensorRT-LLM being open source is a lie, all the important kernels are loaded from cubins.
It also appears that this was a sampling benchmark...which is not representative.
Generation benchmark was 5% faster than SGLang.