Comment by api

14 days ago

Looks like 109B would fit in a 64GiB machine's RAM at 4-bit quantization. Looking forward to trying this.

I read somewhere that ryzen AI 370 chip can run gemma 3 14b at 7 tokens/second, so I would expect the performance to be somewhere in that range for llama 4 scout with 17b active