Comment by sanchitmonga22

3 months ago

Fair feedback on the README clarity, we've updated it to make the licensing distinction between RCLI (MIT) and MetalRT (proprietary) more prominent. That should have been clearer from day one.

On why we built MetalRT instead of using CoreML or MLX:

CoreML is optimized for classification and vision models, not autoregressive text generation. ANE is powerful for fixed-shape workloads but doesn't handle the dynamic shapes in LLM decode well.

MLX is much closer to what we need, and we respect what Apple has built. But MLX is a general-purpose array framework, it carries abstractions for developer ergonomics and portability that add overhead. MetalRT is purpose-built for inference only, and the numbers reflect that: 1.1-1.2x faster on LLMs (same model files) and 4.6x faster on STT.

We also needed one unified engine for LLM + STT + TTS rather than stitching three separate runtimes together. That doesn't exist in any of the alternatives listed.

The libraries you mentioned (FluidAudio, mlx-swift-audio, sherpa-onnx) are good projects. RCLI actually uses sherpa-onnx as it's fallback engine when MetalRT isn't installed. They solve different problems at different layers of the stack.