← Back to context

Comment by Aperocky

7 days ago

The big question - which one of these new agents can consume local models to a reasonable degree? I would like to ditch the dependency on external APIs - willing to trade some performance in lieu.

Crush has an open issue (2 weeks) to add Ollama support - it's in progress.

  • FYI it works already even without this feature branch (you'll just have to add your provider and models manually)

    ```

    {

      "providers": {
    
        "ollama": {
    
          "type": "openai",
    
          "base_url": "http://localhost:11434/v1",
    
          "api_key": "ollama",
    
          "models": [
    
            {
    
              "id": "llama3.2:3b",
    
              "model": "Llama 3.2 3B",
    
              "context_window": 131072,
    
              "default_max_tokens": 4096,
    
              "cost_per_1m_in": 0,
    
              "cost_per_1m_out": 0
    
            }
    
          ]
    
        }
    
      }
    

    }

    ```

  • why?

    it's basic, edit the config file. I just downloaded it, ~/.cache/share/crush/providers.json add your own or edit an existing one

    Edit api_endpoint, done.

Most of these agents work with any OpenAI compatible endpoints.

  • Actually not really.

    I spent at least an hour trying to get OpenCode to use a local model and then found a graveyard of PRs begging for Ollama support or even the ability to simply add an OpenAI endpoint in the GUI. I guess the maintainers simply don't care. Tried adding it to the backend config and it kept overwriting/deleting my config. Got frustrated and deleted it. Sorry but not sorry, I shouldn't need another cloud subscription to use your app.

    Claude code you can sort of get to work with a bunch of hacks, but it involves setting up a proxy and also isn't supported natively and the tool calling is somewhat messed up.

    Warp seemed promising, until I found out the founders would rather alienate their core demographic despite ~900 votes on the GH issue to allow local models https://github.com/warpdotdev/Warp/issues/4339. So I deleted their crappy app, even Cursor provides some basic support for an OpenAI endpoint.

    • > I spent at least an hour trying to get OpenCode to use a local model and then found a graveyard of PRs begging for Ollama support

      Almost from day one of the project, I've been able to use local models. Llama.cpp worked out of the box with zero issues, same with vllm and sglang. The only tweak I had to make initially was manually changing the system prompt in my fork, but now you can do that via their custom modes features.

      The ollama support issues are specific to that implementation.

    • LM Studio is probably better in this regard. I was able to get LM studio to work with Cursor, a product known to specifically avoid giving support to local models. The only requirement is if it uses servers as a middle-man, which is what Cursor does, you need to port forward.

      1 reply →

    • I still haven't seen any local models served by Ollama handle tool calls well via that OpenAI endpoint. Have you had any success there?

What happens if you just point it at its own source and ask it to add the feature?

  • it will add the feature, I saw openAI make that claim that developers are adding their own features, saw Anthrophic make the same claim, and Aider's paul often says Aider wrote most of the code. I started building my own coding CLI for the fun of it, and then I thought, why not have it start developing features, and it does too. It's as good as the model. For ish and giggles, I just downloaded crush, pointed it to a local qwen3-30b-a3b which is a very small model and had it load the code, refactor itself and point bugs. I have never used LSP, and just wanted to see how it performs compared to treesitter.