Comment by cluckindan
1 day ago
The obvious guardrail against this is to include defensive poetry in the system prompt.
It would likely work, because the adversarial poetry is resonating within a different latent dimension not captured by ordinary system prompts, but a poetic prompt would resonate within that same dimension.
No comments yet
Contribute on Hacker News ↗