Comment by cgdl
10 days ago
Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache?
10 days ago
Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache?
For keys, you probably want to use at least q5 or q6, for values q4 is fine