← Back to context Comment by GaggiX 10 hours ago At 4-bit quantization it should already fit quite nicely. 5 comments GaggiX Reply Aurornis 9 hours ago Unfortunately not with a reasonable context length. regularfry 3 hours ago I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment. corysama 1 hour ago Hey, buddy! Can I bum a command line arg list off ya? kkzz99 9 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090. GaggiX 8 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
Aurornis 9 hours ago Unfortunately not with a reasonable context length. regularfry 3 hours ago I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment. corysama 1 hour ago Hey, buddy! Can I bum a command line arg list off ya? kkzz99 9 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090. GaggiX 8 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
regularfry 3 hours ago I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment. corysama 1 hour ago Hey, buddy! Can I bum a command line arg list off ya?
kkzz99 9 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
GaggiX 8 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
Unfortunately not with a reasonable context length.
I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment.
Hey, buddy! Can I bum a command line arg list off ya?
It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.