Comment by dofm

1 day ago

It's the one I have loaded right now.

It got rather tangled up when I tried it with one of my coding tests, which is a simple wordpress plugin, but I frustrate the model by asking it to write code for older PHP, break WP coding conventions and use a rather bespoke method for arranging code in objects. So it is sort of a hybrid of a green field and brown field task; a bit muddy.

It did not do as well as Qwen 3.6 35B, but the way it worked through its thoughts was interesting.

TBH I struggled to understand what DeepReinforce are doing that is materially different; the explanation of their training technique goes over my head at this point.