Comment by bitexploder
12 hours ago
That is part of it. They are also trained to think in very well mapped areas of their model. All the RHLF, etc. tuned on their CoT and user feedback of responses.
12 hours ago
That is part of it. They are also trained to think in very well mapped areas of their model. All the RHLF, etc. tuned on their CoT and user feedback of responses.
No comments yet
Contribute on Hacker News ↗