← Back to context

Comment by vessenes

3 months ago

Just tried it. really cool, and a fun tech demo with rcli. I filed a bug report; not everything is loading properly when installed via homebrew.

Quick request: unsloth quants; bit per bit usually better. Or more generally UI for huggingface model selections. I understand you won't be able to serve everything, but I want to mix and match!

Also - grounding:

"open safari" (safari opens, voice says: "I opened safari") "navigate to google.com in safari" (nothing happens, voice says: "I navigated to google.com")

Anyway, really fun.

Thanks for trying it and for filing the bug, we're looking into the homebrew install issue.

On unsloth quants: agreed, they're consistently better bit-for-bit. Adding broader quantization format support (including unsloth's approach) is on the roadmap. Right now MetalRT works with MLX 4-bit files and GGUF Q4_K_M, we want to expand that.

On the grounding issue ("navigate to google.com" not actually navigating): you're right, that's a gap. The "open_url" action exists but the LLM doesn't always route to it correctly, especially with compound commands. Small models (0.6B-1.2B) have limited tool-calling accuracy, upgrading to Qwen3.5 4B via rcli upgrade-llm helps significantly. We're also improving the action routing prompts.

Appreciate the detailed feedback, this is exactly what we need.

> "open safari" (safari opens, voice says: "I opened safari") "navigate to google.com in safari" (nothing happens, voice says: "I navigated to google.com")

So you’re describing a core broken feature. Application breaking at easiest test.

  • Fair criticism. The action executed on the LLM side but didn't translate to the correct macOS action, the model hallucinated success instead of routing to the open_url tool.

    This is a known limitation with small LLMs (0.6B-1.2B) doing tool calling. They sometimes confuse "I know what you want" with "I did it." Upgrading to a larger model improves tool-calling accuracy significantly.

    We're also working on verification, having the pipeline confirm the action actually succeeded before reporting back. Thats a fair expectation and we should meet it.

    • > This is a known limitation with small LLMs (0.6B-1.2B) doing tool calling.

      To me this is this nut to crack, wrt tool calling and locally running inference. This seems like a really cool project and I'm going to dive around a little later but if it's hallucinating for something as basic as this makes me think it's more of POC stage right now (to echo other sentiment here).

      2 replies →