Comment by liuliu

3 months ago

This is not different from mlx-lm other than it uses a closed-source inference engine.

Respectfully, the benchmarks show it is different.

MetalRT and mlx-lm use the exact same model files, identical 4-bit MLX weights. That makes it a pure engine-to-engine comparison:

LLM decode: MetalRT is 1.10-1.19x faster across all models tested

STT: 70s audio in 101ms vs 463ms (4.6x faster)

TTS: 178ms vs 493ms (2.8x faster)

mlx-lm is a general-purpose array computation framework that also supports inference. MetalRT is purpose-built for inference only. That focus is where the performance gap comes from.

You can reproduce these numbers yourself: rcli bench runs the same benchmarks we published. Full methodology: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...

Yes, MetalRT is closed-source. We're transparent about that. The performance difference is the reason it exists.