Comment by ericbarrett
3 hours ago
> GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.
I speculate that this has more to do with recent high-profile cases of self harm related to "AI psychosis" than any AGI-adjacent danger. I've read a few of the chat transcripts that have been made public in related lawsuits, and there seems to be a recurring theme of recursive or self-modifying enlightenment role-played by the LLM. Discouraging exploration of these themes would be a logical change by the vendors.
No comments yet
Contribute on Hacker News ↗