Comment by verdverm

9 months ago

I think training it to do that would be the hard part.

- stopping is probably the easy part

- I assume this happens during RLFH phase

- Does the model simply stop or does it ask a question?

- You need a good response or interaction, depending on the query? So probably sets or decision trees of them, or agentic even? (chicken-egg problem?)

- This happens 10s of thousands of times, having humans do it, especially with coding, is probably not realistic

- Incumbents like M$ with Copilot may have an advantage in crafting a dataset

0 comments

verdverm

No comments yet

Contribute on Hacker News ↗