Comment by symbolicAGI
2 days ago
Given chat and API needs for low-latency, llama.cpp is probably still the best choice for self hosted models with or without GPU support. And Ollama is the leader for wrapping llama.cpp.
Because Tokasaurus was mentioned as better than Ollama for conducting darwinian godel machine operations (self-improvement), I looked for the linked repo on GitHub and it was 404. So glad it is back https://github.com/ScalingIntelligence/tokasaurus.