Comment by moffkalast
16 hours ago
Looks like another Deepseek distil like the new Ministrals. For every other use case that would be an insult, but for coding that's a great approach given how much lead in coding performance Qwen and Deepseek have on Mistral's internal datasets. The Small 24B seems to have a decent edge on 30BA3B, though it'll be comparatively extremely slow to run.
No comments yet
Contribute on Hacker News ↗