Comment by virangjhaveri 2 days ago Do you reward the RL model based on the token consumption when multiple LLMs complete the task ? 1 comment virangjhaveri Reply adchurch 1 day ago Effectively yes (based on cost though, not raw token count)
Effectively yes (based on cost though, not raw token count)