Comment by Retr0id

2 years ago

Sometimes it "apologizes" rather than saying "sorry", you could build a fairly solid heuristic but I'm not sure you can catch every possible phrasing.

OpenAI could presumably add a "did the safety net kick in?" boolean to API responses, and, also presumably, they don't want to do that because it would make it easier to systematically bypass.

23 comments

Retr0id

roywiggins 2 years ago

> OpenAI could presumably add a "did the safety net kick in?" boolean to API responses, and, also presumably, they don't want to do that because it would make it easier to systematically bypass.

Is a safety net kicking in or is the model just trained to respond with a refusal to certain prompts? I am fairly sure it's usually the latter, and in that case even OpenAI can't be sure a particular response is a refusal or not.

wongarsu 2 years ago

Just feed the text to a new ChatGPT conversation and ask it whether the text is an apology or a product description.

Or do traditional NLP, but letting ChatGPT classify your text is less effort to set up

sargun 2 years ago
Right, it seems like having another model (or just simply doing it with chatgpt itself) do adversarial classification is the right model here.
- pixl97 2 years ago
  
  Yea, I'd expect some lower powered model would be able handle and catch the OpenAI apologies messages at a much lower cost too.
  
  1 reply →
rcthompson 2 years ago
What happens when ChatGPT apologizes instead of answering your question about whether the text is an apology or a product description?
- tester457 2 years ago
  
  You simply feed the text to another ChatGPT.
  Just kidding, it should only require function calling[0] to solve this. Make the program return an error if the output isn't a boolean. It's easy to avoid this mistake.
  [0]: https://platform.openai.com/docs/guides/function-calling
- nprateem 2 years ago
  
  Even when you tell it to stop apologising, the first thing it does is apologise. Our jobs are totally safe.
  
  2 replies →

Cheer2171 2 years ago

> OpenAI could presumably add a "did the safety net kick in?" boolean to API responses, and, also presumably, they don't want to do that because it would make it easier to systematically bypass.

This exists and is a free API: https://platform.openai.com/docs/guides/moderation

cedws 2 years ago

It's hilarious that people think ChatGPT is about to change the world when interaction with it is this primitive.

AlecSchueler 2 years ago
Dogs and horses changed the world with much more primitive communication skills.
- Wolfenstein98k 2 years ago
  
  Dogs and horses didn't perform in the world solely by communication
  
  1 reply →

ryandamm 2 years ago

Why not have a separate chat request to apology-check the responses?

Not my original idea, there was a link from HN where the dev did just that.

Retr0id 2 years ago
Sounds like a great way to double your API bills, and maybe that's worth it, but it seems pretty heavy-handed to me (and equally not 100% watertight).
- Cheer2171 2 years ago
  
  OpenAI's moderation API is free and just tells you if your query will be declined: https://platform.openai.com/docs/guides/moderation
- spdustin 2 years ago
  
  Only allow one token to answer. Use logit bias to make "0" or "1" the most probable tokens. Ask it "Is this message an apology? Return 0 for no, 1 for yes." Feed it only the first 25 tokens of the message you're checking.

AnarchismIsCool 2 years ago

Time to create on algorithm that operates on the safety flag boolean to optimize phrases to bypass it

boxedadin 2 years ago

You could go full circle and ask OpenAI to determine if another instance of OpenAI was apologetic.

nxobject 2 years ago

Sounds like a "good" add-on service to have to purchase as an extra.