Comment by nurettin

5 hours ago

I just start llama.cpp serve with the gguf which creates an openai compatible endpoint.

The session so far is stored in a file like /tmp/s.json messages array. Claude reads that file, appends its response/query, sends it to the API and reads the response.

I simply wrapped this process in a python script and added tool calling as well. Tools run on the client side. If you have Claude, just paste this in :-)