← Back to context Comment by Aurornis 6 hours ago Unfortunately not with a reasonable context length. 2 comments Aurornis Reply kkzz99 5 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090. GaggiX 4 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
kkzz99 5 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
GaggiX 4 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.