Comment by eru
1 month ago
Interesting, you could defeat this one by making the subverted model talk in code (eg hiding information in capitalisation or punctuation), with things spread out enough that you need a long context window to catch on.
1 month ago
Interesting, you could defeat this one by making the subverted model talk in code (eg hiding information in capitalisation or punctuation), with things spread out enough that you need a long context window to catch on.
No comments yet
Contribute on Hacker News ↗