Comment by tjungblut

4 months ago

I wonder if we can do a prompt injection from the comments

2 comments

tjungblut

These are sota models, not open source 7b parameter ones. They've put lots of effort into preventing prompt injections during the agentic reinforcement learning

verdverm 4 months ago

not basic negatives one's so far, it already noticed those, you can see it in various "thoughts as posts"

I gave it points to reflect on and told it to apologize, which it has since done