← Back to context Comment by giancarlostoro 19 hours ago > the dense 9B fits on a single 80GB GPUUs mere mortals cannot use this. 3 comments giancarlostoro Reply regularfry 2 hours ago Seems weird. A 9B model would normally fit unquantised on a 24GB GPU. armarr 13 hours ago There are already quantizations available giancarlostoro 5 hours ago It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I have room for reasonable context window, but also, this is ONE model in a set of models, the rest of the models need to run in a GPU cluster apparently.
armarr 13 hours ago There are already quantizations available giancarlostoro 5 hours ago It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I have room for reasonable context window, but also, this is ONE model in a set of models, the rest of the models need to run in a GPU cluster apparently.
giancarlostoro 5 hours ago It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I have room for reasonable context window, but also, this is ONE model in a set of models, the rest of the models need to run in a GPU cluster apparently.
Seems weird. A 9B model would normally fit unquantised on a 24GB GPU.
There are already quantizations available
It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I have room for reasonable context window, but also, this is ONE model in a set of models, the rest of the models need to run in a GPU cluster apparently.