← Back to context

Comment by focusgroup0

3 months ago

The fact that Apple didn't ship this in years after Siri acquisition is an indictment of its Product leadership

Apple has the silicon, the frameworks (MLX, CoreML), and the models. The gap is putting it all together into a fast, unified on-device pipeline. That's what we're focused on, and honestly, we think Apple will eventually ship something similar natively. Until then, we're trying to show whats possible today on their hardware.

This is not different from mlx-lm other than it uses a closed-source inference engine.

  • Respectfully, the benchmarks show it is different.

    MetalRT and mlx-lm use the exact same model files, identical 4-bit MLX weights. That makes it a pure engine-to-engine comparison:

    LLM decode: MetalRT is 1.10-1.19x faster across all models tested

    STT: 70s audio in 101ms vs 463ms (4.6x faster)

    TTS: 178ms vs 493ms (2.8x faster)

    mlx-lm is a general-purpose array computation framework that also supports inference. MetalRT is purpose-built for inference only. That focus is where the performance gap comes from.

    You can reproduce these numbers yourself: rcli bench runs the same benchmarks we published. Full methodology: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...

    Yes, MetalRT is closed-source. We're transparent about that. The performance difference is the reason it exists.