← Back to context

Comment by wholinator2

19 hours ago

Hmmm, that's an area of study id've never considered before. Digital Psychopharmacology, Artificial Behavioral Systems Engineering. If we accept these things as minds, why not study temporary perturbations of state. We'd need to be saving a much much more complicated state than we are now though right? I wish i had time to read more papers

Here's a neural network concept from the 90s where the neurons are bathed in diffusing neuromodulator 'gases', inspired by nitric oxide action in the brain. It's a source of slow semi-local dynamics for the network meta-parameter optimization (GA) to make use of. You could change these networks' behavior by tweaking the neuromodulators!

https://sussex.figshare.com/articles/journal_contribution/Be...

I'm not an author. I followed the work at the time.

This is kind of what Golden Gate Claude was.

A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

Similarly, in the more recent research showing anxiety and desperation signals predicting the use of blackmail as an option opens the door for digital sedatives to suppress those signals.

Anthropic has been mostly cautious about avoiding this kind of measurement and manipulation in training. If it is done during training you might just train the signals to be undetectable and consequently unmanipulatable.

  • > A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

    Great, now we've got digital Salvia

  • Golden Gate Claude was two years ago and it's surprising there hasn't been as much research into targeted activations since.

    • There’s been some, but naive activation steering makes models dumber pretty reliably and training an SAE is a pretty heavy lift.

Right, there's a lot of research on LLM mental models and also how well they can "read" human psychological profiles. It's a cool field.

I think that was an intro to a dj dieselboy set.. beyond the black bassline. Nope, nope. Close though.