Comment by simonw

7 months ago

I was thinking it would actually be really interesting to take the Grok system prompt that was running when it went MechaHitler and try that (and a bunch of nasty prompts) against different models to see what happens.

2 comments

simonw

skybrian 7 months ago

Yes, and I wonder if the recent research about "emergent misalignment" might be somehow related?

skocznymroczny 7 months ago

Well, it didn't really go MechaHitler. It was prompted with a question if it would rather be MechaHitler or GigaJew. The way LLMs and temperatures work you can reroll the answer and get either.