← Back to context Comment by pheggs 2 months ago you can pull directly from huggingface with llama.cpp, and it also has a decent web chat included 3 comments pheggs Reply speedgoose 2 months ago Does it have a model registry with an API and hot swapping or you still have to use sometime like llama swap as suggested in the article ? Or is it CLI? dminik 2 months ago You can have multiple models served now with loading/unloading with just the server binary.https://github.com/ggml-org/llama.cpp/blob/master/tools/serv... speedgoose 2 months ago It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.
speedgoose 2 months ago Does it have a model registry with an API and hot swapping or you still have to use sometime like llama swap as suggested in the article ? Or is it CLI? dminik 2 months ago You can have multiple models served now with loading/unloading with just the server binary.https://github.com/ggml-org/llama.cpp/blob/master/tools/serv... speedgoose 2 months ago It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.
dminik 2 months ago You can have multiple models served now with loading/unloading with just the server binary.https://github.com/ggml-org/llama.cpp/blob/master/tools/serv... speedgoose 2 months ago It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.
speedgoose 2 months ago It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.
Does it have a model registry with an API and hot swapping or you still have to use sometime like llama swap as suggested in the article ? Or is it CLI?
You can have multiple models served now with loading/unloading with just the server binary.
https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...
It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.