Comment by mlboss
3 months ago
One way to bypass the censor is to ask it to return the response by using numbers for alphabets where it can. e.g. 4 for A, 3 for e etc.
Somebody in reddit discovered this technique. https://www.reddit.com/r/OpenAI/comments/1ibtgc5/someone_tri...
See, it's stuff like this where I believe the control issue may be near impossible to solve at the end of the day.
Censorship just needs to work well enough for the average person. The brightest people who can bypass the censorship will be labeled crazy conspiracy theorists.
Jesus we are reaching levels of blinking for torture of these models: https://www.youtube.com/watch?v=WZ256UU8xJ0
This is day 1 jailbreaking common sense