Comment by ritzaco

3 months ago

I'm surprised - I haven't gotten anywhere near as dark as this, but I've tried some stuff out of curiosity and the safety always seemed tuned very high to me, like it would just say "Sorry I can't help with that" the moment you start asking for anything dodge.

I wonder if they A/B test the safety rails or if longer conversations that gradually turn darker is what gets past those.

5 comments

ritzaco

conception 3 months ago

4o is the main problem here. Try it out and see how it goes.

jaredklewis 3 months ago

The ways LLMs work, the outcomes are probabilistic, not deterministic.

So the guardrails might only fail one in a thousand times.

skygazer 3 months ago

Also, the longer the context window, the more likely the LLM derangement/ignoring safety. Frequently, those with questionable dependence on AI stay in the same chat indefinitely, because that's where the LLM has developed the ideosyncracies the user prefers.

the__alchemist 3 months ago

Meanwhile, ask it for information on Lipid Nanoparticles!