Comment by rhdunn
5 days ago
Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.
5 days ago
Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.
True, but anything remotely useful is 300B and above
That is a very broad and silly position to take, especially in this thread.
I use Devstral 2 and Gemini 3 daily.
Devstral 2 is 123B parameters. Thats less than 300B, but its still much larger than the 3-14B models GP was talking about.