← Back to context

Comment by rhdunn

5 days ago

Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.

True, but anything remotely useful is 300B and above

  • That is a very broad and silly position to take, especially in this thread.

    I use Devstral 2 and Gemini 3 daily.

    • Devstral 2 is 123B parameters. Thats less than 300B, but its still much larger than the 3-14B models GP was talking about.