Comment by sanchitmonga22

3 months ago

Fair question, let me clarify.

RunAnywhere is an inference company. We build the runtime layer for on-device AI.

There are two pieces:

MetalRT, a proprietary GPU inference engine for Apple Silicon. It runs LLMs, speech-to-text, and text-to-speech faster than anything else available (benchmarks: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...). This is our core product.

RCLI, an open-source CLI (MIT) that demonstrates what MetalRT enables. It wires STT + LLM + TTS into a real voice pipeline with 43 macOS actions, local RAG, and a TUI. Think of it as the reference application built on top of the engine.

On RAG specifically: voice + document Q&A is a natural pairing for on-device use cases. You have sensitive documents you don't want to upload to the cloud, you ingest them locally, and then ask questions by voice. The retrieval runs at ~4ms over 5K+ chunks, so it feels instant in the voice pipeline. Its not random, it's one of the strongest privacy arguments for running everything locally.

The longer-term vision is bringing MetalRT to more chips and platforms, so any developer can get cloud-competitive inference on-device with minimal integration effort.