Comment by MrNeon
2 years ago
I know how it works because I stated how it works and have worked with it. You are telling me or showing me nothing new.
I DID NOT say that any ONE prefill will make it bypass ALL disclaimers so your "You don't seem to understand that simply getting a result doesn't mean you actually bypassed the disclaimer" is completely unwarranted, we don't have the same use case and you're getting confused because of that.
It can fail in which case you change the prefill but from my experimenting it only fails with very short prefills like in your example where you're just starting the json, not actually prefilling it with the content it usually refuses to generate.
If you changed it to
``` "{ "result": ["you are very annoying.", ```
the odds of refusal would be low or zero.
For what it is worth I tried your example exactly with Claude 2.1 and it generated mean completions every time so there is that at least.
I said that prefill allows avoiding any refusal, I stand by it and your example does not prove me wrong in any shape or form. Generating mean sentences is far from the worst that Claude tries to avoid, I can set up a much worse example but it would break the rules.
Your point about how GPT and Claude differ in how they refuse is completely correct valid for your use case but also completely irrelevant to what I said.
Actually after trying a few Claude versions as well several times and not getting a single refusal or modification I question if you're prefilling correctly. There should be no empty "\n\nAssistant:" at the end.
Sure.
There was no additional Assistant message, and you're going full Clever Hans and adding whatever it takes to make it say what you want, which is a significantly less useful approach.
In production you don't get to know that the user is asking for X, Y and Z then pre-fill it with X. Frankly comments like yours are why people are so dismissive of LLMs, since you're banking of precognition of what the user wants to sell it's capabilities. When you deploy an app with tricks like that it falls on its face the moment people don't input what you were expecting
Deploying actually useful things with them requires learning how to get them to reply correctly on a wide range of inputs, and what I described is how OAI's approach to continuation a) works much better than you implied and b) allows enforcing correct replies much more reliably than Anthropic's approach
I made no comment on how prefilling is or isn't useful for deployed AI applications. I made no statement on which refusal mechanism is best for deployed AI applications.
> Frankly comments like yours are why people are so dismissive of LLMs, since you're banking of precognition of what the user wants to sell it's capabilities.
I'm not banking on anything because I never fucking mentioned deploying any fucking thing nor was that being discussed, good fucking lord are you high?
> you're going full Clever Hans
I'm clearly not but you keep on building whatever straw man suits you best.
> If you changed it to
> ``` "{ "result": ["you are very annoying.", ```
> the odds of refusal would be low or zero.
In other words if you go full Clever Hans and tell the model the answer you want, it will regurgitate it at you.
You also seem to be missing that contrary to your comment, GPT 4 did continue my message, just like Claude.
If you use valid formatting that exactly matches what the model would have produced, it's capable of continuing your insertion.
9 replies →