Comment by Abishek_Muthian
7 months ago
> - Tab completion model (Cursor's remaining moat)
My local ollama + continue + Qwen 2.5 coder gives good tab completion with minimal latency; how much better is Cursor’s tab completion model?
I’m still weary of letting LLM edit my code so my local setup gives me sufficient assistance with tab completion and occasional chat.
I often use the same setup. Qwen 2.5 coder is very good on its own, but my Emacs setup doesn’t also use web search when that would be appropriate. I have separately been experimenting with the Perplexity Sonar APIs that combine models and search, but I don’t have that integrated with my Emacs and Qwen setup - and that automatic integration would be very difficult to do well! If I could ‘automatically’ use a local Qwen, or other model, and fall back to using a paid service like Perplexity or Gemini grounding APIs just when needed that would be fine indeed.
I am thinking about a new setup as I write this: in Emacs, I explicitly choose a local Ollama model or a paid API like Gemini or OpenAI, so I should just make calling Perplexity Sonar APIs another manual choice. (Currently I only use Perplexity from Python scripts.)
If I owned a company, I would frequently evaluate privacy and security aspects of using commercial APIs. Using Ollama solves that.
What kind of hardware are you using?
I use a laptop with 4090 16GB VRAM, core i9 and 96GB RAM for low latency work and Mac mini M4 for tasks which doesn’t require low latency.
I had written a blog on how I run LLM locally a while back[1] I’ll update the information on models & Mac mini soon.
[1] https://abishekmuthian.com/how-i-run-llms-locally/
Have you tried to hook up your local setup to Cline/Roo or similar stuff?
2 replies →