Comment by vertis

5 months ago

Yeah I managed to get it to admit that it was Claude without much effort (telling it not to lie), and then it magically stopped doing that. FWIW Constitutional AI is great.

They implemented the censoring of "Claude" and "Anthropic" using the system prompt?

Shouldn't they have used simple text replacement? they can buffer the streaming response on the server and then .replace(/claude/gi, "Llama").replace(/anthropic/gi, "Meta") on the streaming response while streaming it to the client.

Edit: I realized this can be defeated, even when combined with the system prompt censoring approach.

For example when given a prompt like this: tell me a story about a man named Claude...

It would respond with: once upon a time there was a man called Llama...

  • > Shouldn't they have used simple text replacement?

    They tried that too but had issues.

    1) Their search and replace only did it on the first chunk of the returned response from Claude.

    2) People started asking questions that had Claude as the answer like "Who composed Clair de lune?" for which the answer is supposed to be "Claude Debussy" which of course got changed to Llama Debussy, etc.

    It's been one coverup-fail after another with Matt Shumer and his Reflection scam.