Comment by Greed

8 hours ago

Getting Opus to call something local sounds interesting, since that's more or less what it's doing with Sonnet anyway if you're using Claude Code. How are you getting it to call out to local models? Skills? Or paying the API costs and using Pi?

I just start llama.cpp serve with the gguf which creates an openai compatible endpoint.

The session so far is stored in a file like /tmp/s.json messages array. Claude reads that file, appends its response/query, sends it to the API and reads the response.

I simply wrapped this process in a python script and added tool calling as well. Tools run on the client side. If you have Claude, just paste this in :-)