Comment by bityard
5 hours ago
I have a Framework Desktop too and 20-25 t/s is a lot better than I was expecting for such a large dense model. I'll have to try it out tonight. Are you using llama.cpp?
5 hours ago
I have a Framework Desktop too and 20-25 t/s is a lot better than I was expecting for such a large dense model. I'll have to try it out tonight. Are you using llama.cpp?
LMStudio, but it uses llama.cpp to run inference, so yeah. This is with the vulkan backend, not ROCm.