Comment by qeternity

5 days ago

120B MoE. The 20B is dense.

As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B.

4 comments

qeternity

for posterity, since shown that is it actually MoE

> 21B parameters with 3.6B active parameters

How much ram do you need to run this !!??

cubefox 5 days ago
Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window).
- int_19h 5 days ago
  
  You can go below one byte per parameter. 4-bit quantization is fairly popular. It does affect quality - for some models more so than others - but, generally speaking, a 4-bit quantized model is still going to do significantly better than an 8-bit model with 1/2 parameters.