Comment by macrolime

2 years ago

Any particular reason why that shouldn't work well with fine-tuning of an LLM using reinforcement learning?

0 comments