How so? It's rock solid for me. I use ollama but it's based on llama.cpp
It's quite fast also, probably because that card has fast HBM2 memory (it has the same memory bandwidth as a 4090). And it was really cheap as it was on deep sale as an outgoing model.
Aside from the fact that gfx906 is one of the blessed architecture mentioned (so why would it not work). Like how do you look at your specific instance and then turn around and say "All of you are lying, it works perfectly." How do you square that circle in your head
It sometimes works.
How so? It's rock solid for me. I use ollama but it's based on llama.cpp
It's quite fast also, probably because that card has fast HBM2 memory (it has the same memory bandwidth as a 4090). And it was really cheap as it was on deep sale as an outgoing model.
"Sometimes" as in "on some cards". You're having luck with yours, but that doesn't mean it's a good place to build a community.
1 reply →
Aside from the fact that gfx906 is one of the blessed architecture mentioned (so why would it not work). Like how do you look at your specific instance and then turn around and say "All of you are lying, it works perfectly." How do you square that circle in your head
1 reply →