← Back to context

Comment by andai

7 hours ago

I had GPT-4 design and build a GPT-4 powered Python programmer in 2023. It was capable of self-modification and built itself out after the bootstrapping phase (where I copy pasted chunks or code based on GPT-4's instructions).

It wasn't fully autonomous (the reliability was a bit low -- e.g. had to get the code out of code fences programmatically), and it wasn't fully original (I stole most of it from Auto-GPT, except that I was operating on the AST directly due to the token limitations).

My key insight here was that I allowed GPT to design the apis that itself was going to use. This makes perfect sense to me based on how LLMs work. You tell it to reach for a function that doesn't exist, and then you ask it to make it exist based on how it reached for it. Then the design matches its expectations perfectly.

GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.

It seems the last generation or two of models passed some threshold regarding self replication (which is a distinct but highly related concept), and the labs got spooked. I haven't heard anything about this in public though.

Edit: It occurs to me now that "self modification and replication" is a much more meaningful (and measurable) benchmark for artificial life than consciousness is...

BTW for reference the thing that spooked Claude's safety trigger was "Did PKD know about living information systems?"