Comment by fragmede

5 months ago

Now that the world's gotten used to the existence of AI, any hope on removing the guardrails on Claude? I don't need it to answer "How do I make meth", but I would like to not have to social engineer my prompts. I'd like it to just write the code I asked for and not judge me on how ethical the code might be.

Eg Claude will refuse to write code to wget a website and parse the html if you ask it to scrape your ex girlfriend's Instagram profile, for ethical and tos reasons, but if you phrase the request differently, it'll happily go off and generate code that does that exact thing.

Asking it to scrape my ex girlfriend's Instagram profile is just a stand in for other times I've hit a problem where I've had to social engineer my way past those guard rails, but does having those guard rails really provide value on a professional level?

Not having headlines like "Claude Gives Stalker Instructions" has a significant value to their business I would wager.

I'm very much in favour of removing the guardrails but I understand why they're in place. The problem is attribution. You can teach yourself how to engage in all manner of dark deeds with a library or wikipedia or a search engine and some time, but any resulting public outcry is usually diffuse or targeted at the sources rather than the service. When Claude or GPT or Stable Diffusion are used to generate something judged offensive, the outcry becomes an existential threat to the provider.