Comment by pbronez

11 hours ago

I haven’t tried it yet, but Rapid MLX has a neat feature for automatic model switching. It runs a local model using Apple’s MLX framework, then “falls forward” to the cloud dynamically based on usage patterns:

> Smart Cloud Routing > > Large-context requests auto-route to a cloud LLM (GPT-5, Claude, etc.) when local prefill would be slow. Routing based on new tokens after cache hit. --cloud-model openai/gpt-5 --cloud-threshold 20000