Comment by Mil0dV

9 hours ago

Does this also apply to gemma's 26B-A4B vs say Qwens 35B-A3B?

I'm not sure if I can make the 35B-A3B work with my 32GB machine

1 comment

Mil0dV

It should be easy with a Q4 (quantization to 4 bits per weight) and a smallish context.

You won't have much RAM left over though :-/.

At Q4, ~20 GiB