Comment by rushingcreek
3 months ago
Very cool, congrats! I'm curious how you were able to achieve this given Apple's many undocumented APIs. Does it use private Neural Engine APIs or fully public Metal APIs?
Either way, this is a tremendous achievement and it's extremely relevant in the OpenClaw world where I might not want to have sensitive information leave my computer.
Fully public Metal APIs, no private frameworks, no Neural Engine, no undocumented entitlements.
MetalRT is built on the public Metal API. The performance comes from how we use the GPU, not from accessing anything Apple doesn't document.
We specifically chose to stay on public APIs so that MetalRT works on any Apple Silicon Mac without special entitlements or SIP workarounds. This also means its App Store compatible for future macOS/iOS distribution.
The results speak for themselves: 1.1-1.19x faster than Apple's own MLX on identical model files, 4.6x faster on STT, 2.8x faster on TTS. Full methodology published here: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...
Appreciate the kind words, the "OpenClaw world" framing is exactly why we built this.