Comment by verdverm
1 day ago
I think training it to do that would be the hard part.
- stopping is probably the easy part
- I assume this happens during RLFH phase
- Does the model simply stop or does it ask a question?
- You need a good response or interaction, depending on the query? So probably sets or decision trees of them, or agentic even? (chicken-egg problem?)
- This happens 10s of thousands of times, having humans do it, especially with coding, is probably not realistic
- Incumbents like M$ with Copilot may have an advantage in crafting a dataset
No comments yet
Contribute on Hacker News ↗