Respectfully, the benchmarks show it is different.
MetalRT and mlx-lm use the exact same model files, identical 4-bit MLX weights. That makes it a pure engine-to-engine comparison:
LLM decode: MetalRT is 1.10-1.19x faster across all models tested
STT: 70s audio in 101ms vs 463ms (4.6x faster)
TTS: 178ms vs 493ms (2.8x faster)
mlx-lm is a general-purpose array computation framework that also supports inference. MetalRT is purpose-built for inference only. That focus is where the performance gap comes from.
Respectfully, the benchmarks show it is different.
MetalRT and mlx-lm use the exact same model files, identical 4-bit MLX weights. That makes it a pure engine-to-engine comparison:
LLM decode: MetalRT is 1.10-1.19x faster across all models tested
STT: 70s audio in 101ms vs 463ms (4.6x faster)
TTS: 178ms vs 493ms (2.8x faster)
mlx-lm is a general-purpose array computation framework that also supports inference. MetalRT is purpose-built for inference only. That focus is where the performance gap comes from.
You can reproduce these numbers yourself: rcli bench runs the same benchmarks we published. Full methodology: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...
Yes, MetalRT is closed-source. We're transparent about that. The performance difference is the reason it exists.
[dead]