← Back to context Comment by Aurornis 9 hours ago Unfortunately not with a reasonable context length. 4 comments Aurornis Reply regularfry 3 hours ago I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment. corysama 1 hour ago Hey, buddy! Can I bum a command line arg list off ya? kkzz99 9 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090. GaggiX 8 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
regularfry 3 hours ago I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment. corysama 1 hour ago Hey, buddy! Can I bum a command line arg list off ya?
kkzz99 9 hours ago It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
GaggiX 8 hours ago The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.
I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment.
Hey, buddy! Can I bum a command line arg list off ya?
It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.
The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.