Comment by virangjhaveri

2 days ago

Do you reward the RL model based on the token consumption when multiple LLMs complete the task ?

1 comment