Comment by behnamoh
2 years ago
I don’t like Anthropic. they over-RLHF their models and make them refuse most requests. A conversation with Claude has never been pleasant to me. it feels like the model has an attitude or something.
2 years ago
I don’t like Anthropic. they over-RLHF their models and make them refuse most requests. A conversation with Claude has never been pleasant to me. it feels like the model has an attitude or something.
It's awful. 9/10 of things I ask Claud, I get denied because it crosses some kind of imaginary ethical boundary that's completely irrelevant.
Interesting! I use the APIs for various NLP tasks and I have never had it deny generating answers.
Maybe the scope of the tasks is different, but I've tried to have it do things like analyze a chat app export in order to help come up with marketing content and it wouldn't do it, because it's "unethical". I've also had similar friction testing it for threat intel related tasks as well.
[flagged]
> over-RLHF
Over RLAIF, which basically makes the model less diverse and being more and more like the seed content which they call "Constitution" in their papers. Seed content is available here[1]. You can clearly see it is awful and has no diversity in opinions and basically generated by a team who only knows of textbook definition of ethics.
[1]: https://huggingface.co/datasets/Anthropic/hh-rlhf
Well, to me the fact that everyone is complaining about refusals no matter how they change the prompt shows RLAIF works pretty well. It seems to be prepared to refuse things no matter how they are formulated. If you want to make sure a LLM doesn't say stupid things this is a great method. The only problem is Anthropic banned too many topics.
When I don't trigger the refusal I get better conversation style from Claude than GPT-4. I often exhaust my Claude quota and have to move over to GPT-4, which is dry and no fun. Maybe Claude knows how to suck up to users better than GPT-4, but I don't get annoyed because before it congratulates me on something, it explains clearly what they understood from my last message, and it gets it really well.
Probably training on HN comments.
;)
More like it attended an HR DEI ESG session and decided to make it its personality from then on.
Luckily, unlike OpenAI, Anthropic lets you prefill Claude's response which means zero refusals.
OpenAI allows the same via API usage, and unlike Claude it *won't dramatically degrade performance or outright interrupt its own output if you do that.
It's impressively bad at times: using it for threat analysis I had it adhering to a JSON schema, and with OpenAI I know if the output adheres to the schema, there's no refusal.
Claude would adhere and then randomly return disclaimers inside of the JSON object then start returning half blanked strings.
> OpenAI allows the same via API usage
I really don't think so unless I missed something. You can put an assistant message at the end but it won't continue directly from that, there will be special tokens in between which makes it different from Claude's prefill.
16 replies →
Can you give an example in how Anthropic and OpenAI differ in that?
From Anthropic's docs: https://docs.anthropic.com/claude/docs/configuring-gpt-promp...
In OpenAI's case their "\n\nAssistant:" equivalent is added server side with no option to prefill the response.
Good thing that you can now use a system prompt to (theoetically) override most of the RLHF.
I agree, but that’s what you get when your mission is AI Safety so it’s going to be a dull experience.
Maybe he is parisian