Comment by Nerd_Nest 6 months ago Whoa, 120B? That’s huge. 5 comments Nerd_Nest Reply qeternity 6 months ago 120B MoE. The 20B is dense.As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B. nivvis 6 months ago for posterity, since shown that is it actually MoE> 21B parameters with 3.6B active parameters sciencesama 6 months ago How much ram do you need to run this !!?? cubefox 6 months ago Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window). 1 reply →
qeternity 6 months ago 120B MoE. The 20B is dense.As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B. nivvis 6 months ago for posterity, since shown that is it actually MoE> 21B parameters with 3.6B active parameters sciencesama 6 months ago How much ram do you need to run this !!?? cubefox 6 months ago Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window). 1 reply →
nivvis 6 months ago for posterity, since shown that is it actually MoE> 21B parameters with 3.6B active parameters
sciencesama 6 months ago How much ram do you need to run this !!?? cubefox 6 months ago Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window). 1 reply →
cubefox 6 months ago Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window). 1 reply →
120B MoE. The 20B is dense.
As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B.
for posterity, since shown that is it actually MoE
> 21B parameters with 3.6B active parameters
How much ram do you need to run this !!??
Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window).
1 reply →