Comment by embedding-shape
5 hours ago
> Maybe just a direct layer on top of vllm
My dream would be something like vLLM, but without all the Python mess, packaged as a single binary that has both HTTP server + desktop GUI, and can browse/download models. Llama.cpp is like 70% there, but large performance difference between llama.cpp and vLLM for the models I use.
No comments yet
Contribute on Hacker News ↗