← Back to context

Comment by Animats

14 hours ago

Link?

It's interesting that people are writing tools that go inside the weights and do things. We're getting past the black box era of LLMs.

That may or may not be a good thing.

I believe that this is already done to several models. One that I've come across are the JOSIEfied models from Gökdeniz Gülmez. I downloaded one or two and tried them on a local ollama setup. It does generate potentially dangerous output. Turning on thinking for the QWEN series shows how it arrives at it's conclusions and it's quite disturbing.

However, after a few rounds of conversation, it gets into loops and just repeats things over and over again. The main JOSIE models worked the best of all and was still useful even after abliteration.