Comment by andai

1 month ago

I had GPT-4 design and build a GPT-4 powered Python programmer in 2023. It was capable of self-modification and built itself out after the bootstrapping phase (where I copy pasted chunks or code based on GPT-4's instructions).

It wasn't fully autonomous (the reliability was a bit low -- e.g. had to get the code out of code fences programmatically), and it wasn't fully original (I stole most of it from Auto-GPT, except that I was operating on the AST directly due to the token limitations).

My key insight here was that I allowed GPT to design the apis that itself was going to use. This makes perfect sense to me based on how LLMs work. You tell it to reach for a function that doesn't exist, and then you ask it to make it exist based on how it reached for it. Then the design matches its expectations perfectly.

GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.

It seems the last generation or two of models passed some threshold regarding self replication (which is a distinct but highly related concept), and the labs got spooked. I haven't heard anything about this in public though.

Edit: It occurs to me now that "self modification and replication" is a much more meaningful (and measurable) benchmark for artificial life than consciousness is...

BTW for reference the thing that spooked Claude's safety trigger was "Did PKD know about living information systems?"

2 comments

andai

ericbarrett 1 month ago

> GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.

I speculate that this has more to do with recent high-profile cases of self harm related to "AI psychosis" than any AGI-adjacent danger. I've read a few of the chat transcripts that have been made public in related lawsuits, and there seems to be a recurring theme of recursive or self-modifying enlightenment role-played by the LLM. Discouraging exploration of these themes would be a logical change by the vendors.

andai 1 month ago

Heh, well associating a potentially internet-ending line of research with mental illness qualifies as a societal prophylactic.