Comment by behnamoh

2 years ago

I don’t like Anthropic. they over-RLHF their models and make them refuse most requests. A conversation with Claude has never been pleasant to me. it feels like the model has an attitude or something.

32 comments

behnamoh

j0hnyl 2 years ago

It's awful. 9/10 of things I ask Claud, I get denied because it crosses some kind of imaginary ethical boundary that's completely irrelevant.

mark_l_watson 2 years ago
Interesting! I use the APIs for various NLP tasks and I have never had it deny generating answers.
- j0hnyl 2 years ago
  
  Maybe the scope of the tasks is different, but I've tried to have it do things like analyze a chat app export in order to help come up with marketing content and it wouldn't do it, because it's "unethical". I've also had similar friction testing it for threat intel related tasks as well.
pinkyrat2 2 years ago

[flagged]

YetAnotherNick 2 years ago

> over-RLHF

Over RLAIF, which basically makes the model less diverse and being more and more like the seed content which they call "Constitution" in their papers. Seed content is available here[1]. You can clearly see it is awful and has no diversity in opinions and basically generated by a team who only knows of textbook definition of ethics.

[1]: https://huggingface.co/datasets/Anthropic/hh-rlhf

visarga 2 years ago

Well, to me the fact that everyone is complaining about refusals no matter how they change the prompt shows RLAIF works pretty well. It seems to be prepared to refuse things no matter how they are formulated. If you want to make sure a LLM doesn't say stupid things this is a great method. The only problem is Anthropic banned too many topics.
When I don't trigger the refusal I get better conversation style from Claude than GPT-4. I often exhaust my Claude quota and have to move over to GPT-4, which is dry and no fun. Maybe Claude knows how to suck up to users better than GPT-4, but I don't get annoyed because before it congratulates me on something, it explains clearly what they understood from my last message, and it gets it really well.

sroussey 2 years ago

Probably training on HN comments.

;)

Racing0461 2 years ago

More like it attended an HR DEI ESG session and decided to make it its personality from then on.

MrNeon 2 years ago

Luckily, unlike OpenAI, Anthropic lets you prefill Claude's response which means zero refusals.

BoorishBears 2 years ago
OpenAI allows the same via API usage, and unlike Claude it *won't dramatically degrade performance or outright interrupt its own output if you do that.
It's impressively bad at times: using it for threat analysis I had it adhering to a JSON schema, and with OpenAI I know if the output adheres to the schema, there's no refusal.
Claude would adhere and then randomly return disclaimers inside of the JSON object then start returning half blanked strings.
- MrNeon 2 years ago
  
  > OpenAI allows the same via API usage
  I really don't think so unless I missed something. You can put an assistant message at the end but it won't continue directly from that, there will be special tokens in between which makes it different from Claude's prefill.
  
  16 replies →
KaoruAoiShiho 2 years ago
Can you give an example in how Anthropic and OpenAI differ in that?
- MrNeon 2 years ago
  
  From Anthropic's docs: https://docs.anthropic.com/claude/docs/configuring-gpt-promp...
  In OpenAI's case their "\n\nAssistant:" equivalent is added server side with no option to prefill the response.

minimaxir 2 years ago

Good thing that you can now use a system prompt to (theoetically) override most of the RLHF.

melvinmelih 2 years ago

I agree, but that’s what you get when your mission is AI Safety so it’s going to be a dull experience.

seydor 2 years ago

Maybe he is parisian