Comment by RandyOrion
1 month ago
I didn't see many papers on solving this problem.
I see non-stop response as a generalization problem because normally every training sample is not of infinite length.
Targeted supervised fine-tuning should work, as long as you have enough samples. However, supervised fine-tuning is not good for generalization.
No comments yet
Contribute on Hacker News ↗