Comment by xlayn

1 month ago

I updated the results, with just the Devstral part, but ran the full suite for it, and posted all the results file as well as a script to re-run the process.

The results are more spectacular...

The model pointed way better in gsm8k, but lost a bit on the other categories.