Comment by fforflo
4 days ago
If you want to use Ollama to run local models, here’s a simple example:
from ollama import chat, ChatResponse
def call_llm(prompt, use_cache: bool = True, model="phi4") -> str: response: ChatResponse = chat( model=model, messages=[{ 'role': 'user', 'content': prompt, }] ) return response.message.content
Is the output as good?
I'd love the ability to run the LLM locally, as that would make it easier to run on non public code.
It's decent enough. But you'd probably have to use a model like llama2, which may set your GPU on fire.