Comment by smcleod
1 year ago
I get around 4-5t/s with the unsloth 1.58bit quant on my home server that has 2x3090 and 192GB of DDR5 Ryzen 9, usable but slow.
1 year ago
I get around 4-5t/s with the unsloth 1.58bit quant on my home server that has 2x3090 and 192GB of DDR5 Ryzen 9, usable but slow.
how much context size?
Just 4K. Because deepseek doesn't allow for the use of flash attention it means you can't run quantised qkv