← Back to context Comment by Nerd_Nest 5 days ago Whoa, 120B? That’s huge. 5 comments Nerd_Nest Reply qeternity 5 days ago 120B MoE. The 20B is dense.As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B. nivvis 18 hours ago for posterity, since shown that is it actually MoE> 21B parameters with 3.6B active parameters sciencesama 5 days ago How much ram do you need to run this !!?? cubefox 5 days ago Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window). 1 reply →
qeternity 5 days ago 120B MoE. The 20B is dense.As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B. nivvis 18 hours ago for posterity, since shown that is it actually MoE> 21B parameters with 3.6B active parameters sciencesama 5 days ago How much ram do you need to run this !!?? cubefox 5 days ago Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window). 1 reply →
nivvis 18 hours ago for posterity, since shown that is it actually MoE> 21B parameters with 3.6B active parameters
sciencesama 5 days ago How much ram do you need to run this !!?? cubefox 5 days ago Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window). 1 reply →
cubefox 5 days ago Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window). 1 reply →
120B MoE. The 20B is dense.
As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B.
for posterity, since shown that is it actually MoE
> 21B parameters with 3.6B active parameters
How much ram do you need to run this !!??
Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window).
1 reply →