Comment by rolisz
5 hours ago
I'm interested in trying something similar. I was thinking to do this for my OpenClaw agent.
About Owain Evans work: I think he did SFT. On Twitter someone was saying that RL is not as susceptible to what he showed. I'd like to try that
No comments yet
Contribute on Hacker News ↗