← Back to context

Comment by smcleod

1 year ago

I get around 4-5t/s with the unsloth 1.58bit quant on my home server that has 2x3090 and 192GB of DDR5 Ryzen 9, usable but slow.

3 comments

smcleod

Reply

segmondy 1 year ago

how much context size?

smcleod 1 year ago

Just 4K. Because deepseek doesn't allow for the use of flash attention it means you can't run quantised qkv