← Back to context

Comment by nabakin

2 days ago

> On throughput-focused benchmarks, Tokasaurus can outperform vLLM and SGLang by up to 3x+.

Looks like they don't compare to TensorRT-LLM throughput numbers which, last I checked, are SOTA in open source.

TensorRT-LLM being open source is a lie, all the important kernels are loaded from cubins.

It also appears that this was a sampling benchmark...which is not representative.

Generation benchmark was 5% faster than SGLang.