← Back to context Comment by armarr 17 hours ago There are already quantizations available 1 comment armarr Reply giancarlostoro 9 hours ago It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I have room for reasonable context window, but also, this is ONE model in a set of models, the rest of the models need to run in a GPU cluster apparently.
giancarlostoro 9 hours ago It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I have room for reasonable context window, but also, this is ONE model in a set of models, the rest of the models need to run in a GPU cluster apparently.
It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I have room for reasonable context window, but also, this is ONE model in a set of models, the rest of the models need to run in a GPU cluster apparently.