Comment by khimaros
21 hours ago
it's great to see this kind of progress in reproducible weights, but color me confused. this claims to be better and smaller than Devstral-Small-2-24B, while clocking in at 32B (larger) and scoring more poorly?
21 hours ago
it's great to see this kind of progress in reproducible weights, but color me confused. this claims to be better and smaller than Devstral-Small-2-24B, while clocking in at 32B (larger) and scoring more poorly?
Hey! We are able to outperform Devstral-Small-2-24B when specializing on repositories, and come well within the range of uncertainty with our best SERA-32B model. That being said, our model is a bit larger than Devstral 24B. Could you point out what in the paper gave the impression that we were smaller? If theres something unclear we would love to revise
"SERA-32B is the first model in Ai2's Open Coding Agents series. It is a state-of-the-art open-source coding agent that achieves 49.5% on SWE-bench Verified, matching the performance of much larger models like Devstral-Small-2 (24B)" from https://huggingface.co/allenai/SERA-32B
Ah great catch I don't know how we missed that. Thanks! Will fix.