Comment by api
14 days ago
Looks like 109B would fit in a 64GiB machine's RAM at 4-bit quantization. Looking forward to trying this.
14 days ago
Looks like 109B would fit in a 64GiB machine's RAM at 4-bit quantization. Looking forward to trying this.
I read somewhere that ryzen AI 370 chip can run gemma 3 14b at 7 tokens/second, so I would expect the performance to be somewhere in that range for llama 4 scout with 17b active