Comment by lyu07282
1 day ago
can you give some real-world examples for when this would be useful? Does this work for tasks requiring tool calling as well?
1 day ago
can you give some real-world examples for when this would be useful? Does this work for tasks requiring tool calling as well?
Yes tool calling is a prime example!! Ie you have some specific task, and the final output involving some tools, but sadly the steps to call the tools / the stuff in between / the thinking process is missing.
You can employ GRPO and maybe add an actual Python environment for the model to learn to act in.
I'm waiting for https://github.com/huggingface/trl/pull/2810 to land. I think this should work with the existing unsloth setup without changes.
Oh yes!! Will has definitely been on a roll!! Excited for the PR as well!