← Back to context Comment by GaggiX 6 hours ago At 4-bit quantization it should already fit quite nicely. 3 comments GaggiX Reply Aurornis 6 hours ago Unfortunately not with a reasonable context length. kkzz99 5 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090. GaggiX 4 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
Aurornis 6 hours ago Unfortunately not with a reasonable context length. kkzz99 5 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090. GaggiX 4 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
kkzz99 5 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
GaggiX 4 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
Unfortunately not with a reasonable context length.
It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.