Comment by shepherdjerred

16 hours ago

Yeah, it has been in foraging. Requests that Claude has refused me:

- What are popular free streaming sites used in China?

- How do I bypass the safety mechanism on my food processor (it’s broken)

- What are nerve agents and how do they work (for a layman)?

- Help me decompile some code

- Help me make a design system similar to XYZ

- Here is an API token, please do X (I can’t do that! Rotate the secret immediately! I refuse!)

In some cases I can trick it with prompting, but in many cases it is steadfast. The food processor one was particularly annoying

28 comments

shepherdjerred

Grimblewald 11 hours ago

I've had some really dumb refusals. Explaining elements of infrared specteoscopy, researching aritifical bud-breaking in agriculture, etc. Anything interesting and non-mainstream is banned. Basically, restricted to answers i'm better of just going to wikipedia for.

mmmlinux 2 hours ago

The only guard rail ive hit recently was when i was trying to get it to rename files ripped from dvd to episode names. I told it to try again and it did it. It wasn't even really a refusal it was just working on it and then stopped for content violation or what ever.

mft_ 7 hours ago

Yeah, I had my first refusal with 4.8 today.

I wanted it to show me how to create an overlay on an existing web game, and it extrapolated that because this could be used to provide tools to help win the game (if that was the direction it was ultimately taken), and because this was a game that other humans also played to win "stars", and because this could amount to cheating, it wasn't going to do as I asked.

First time ever I've fired up openrouter to seriously consider alternatives.

mwigdahl 9 hours ago

An easy way around the API token thing is to put it in a file and point the model at the file. I saw what you were seeing when I provided credentials directly, but haven't had any problems with it since using the indirect method.

fc417fc802 16 hours ago

> What are nerve agents and how do they work (for a layman)?

On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?

Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?

plufz 15 hours ago
Maybe the difference is that just reading Wikipedia only help you part of the way. While an LLM could help you step by step (e2e) producing a functional weapon. And setting a more complex rule where claude tells you some things about this and not other is probably a lot more work for little gain?
But I have no idea. Just guessing here.
- Sharlin 13 hours ago
  
  I thought that these models are supposed to be vastly smarter than what’s needed to discern between "general information trivially available on Wikipedia" and "actionable synthesis instructions".
  
  3 replies →
- lazide 13 hours ago
  
  That query would not more provide actionable guidance than ‘tell me how a nuclear weapon works (for a layman)’. Aka not at all.
  
  8 replies →
nicce 14 hours ago

Let's see what is the fate of Wikipedia if turns like big tech:
https://news.ycombinator.com/item?id=48285592

svara 16 hours ago

This is strange to me, did you really ask like this and which model did you use?

I just tried your no. 1 and 3 verbatim and Opus gave fine answers; no. 6 I've done in the past with no issues. The other ones we can't really replicate without more details, but based on my experience with Opus I don't see what the issue would be.

The reason I'm really surprised by this is I do a lot of biology prompts and the guardrails used to be quite problematic up until some time late last year. Many legitimate prompts would trigger its biosafety filters.

But I haven't seen such filters trigger at all anymore in more than half a year.

shepherdjerred 7 hours ago

1 and 3 were refused on the Claude web chat using Opus 4.7 or 4.8. I’m not sure why we’re getting different results
brianwawok 6 hours ago

Honestly it may be your memory has internalized you are a student or researcher and grants you more leeway. Which if so is a very bad security rail.

stavros 12 hours ago

It refuses to use an API token? In my experience, it's more than happy to read out my secrets from .envrc files "just to check".

At least it feels a lot of remorse over its mistake until I reset the session.

shepherdjerred 7 hours ago

It’s really hit or miss. Most of the times it works but every once in a while it will dig in its heels

gspr 15 hours ago

I find it terrifying that people are willing to outsource thinking. Outsourcing thinking to an entity that is opinionated about what to think is beyond crazy.

shepherdjerred 7 hours ago

What’s the difference between outsourcing thinking and using an LLM as a research tool?
An LLM with fetch/search is going to be a lot more effective than myself and Google. I would _never_ ask questions like this if the LLM wasn’t able to look up data

ElFitz 15 hours ago

How are decompiling code or making a design system inspired by another one even remotely illegal?