Comment by cfcf14

1 day ago

I'm curious as to why 4.7 seems obsessed with avoiding any actions that could help the user create or enhance malware. The system prompts seem similar on the matter, so I wonder if this is an early attempt by Anthropic to use steering vector injection?

The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).

In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!

19 comments

cfcf14

daemonologist 1 day ago

Note that these are the "chat" system prompts - although it's not mentioned I would assume that Claude Code gets something significantly different, which might have more language about malware refusal (other coding tools would use the API and provide their own prompts).

Of course it's also been noted that this seems to be a new base model, so the change could certainly be in the model itself.

chatmasta 1 day ago
Claude Code system prompt diffs are available here: https://cchistory.mariozechner.at/?from=2.1.98&to=2.1.112
(URL is to diff since 2.1.98 which seems to be the version that preceded the first reference to Opus 4.7)
- dhedlund 1 day ago
  
  The "Picking delaySeconds" section is quite enlightening.
  I feel like this explains about a quarter to half of my token burn. It was never really clear to me whether tool calls in an agent session would keep the context hot or whether I would have to pay the entire context loading penalty after each call; from my perspective it's one request. I have Claude routinely do large numbers of sequential tool calls, or have long running processes with fairly large context windows. Ouch.
  > The Anthropic prompt cache has a 5-minute TTL. Sleeping past 300 seconds means the next wake-up reads your full conversation context uncached — slower and more expensive. So the natural breakpoints:
  > - *Under 5 minutes (60s–270s)*: cache stays warm. Right for active work — checking a build, polling for state that's about to change, watching a process you just started.
  > - *5 minutes to 1 hour (300s–3600s)*: pay the cache miss. Right when there's no point checking sooner — waiting on something that takes minutes to change, or genuinely idle.
  > *Don't pick 300s.* It's the worst-of-both: you pay the cache miss without amortizing it. If you're tempted to "wait 5 minutes," either drop to 270s (stay in cache) or commit to 1200s+ (one cache miss buys a much longer wait). Don't think in round-number minutes — think in cache windows.
  > For idle ticks with no specific signal to watch, default to *1200s–1800s* (20–30 min). The loop checks back, you don't burn cache 12× per hour for nothing, and the user can always interrupt if they need you sooner.
  > Think about what you're actually waiting for, not just "how long should I sleep." If you kicked off an 8-minute build, sleeping 60s burns the cache 8 times before it finishes — sleep ~270s twice instead.
  > The runtime clamps to [60, 3600], so you don't need to clamp yourself.
  Definitely not clear if you're only used to the subscription plan that every single interaction triggers a full context load. It's all one session session to most people. So long as they keep replying quickly, or queue up a long arc of work, then there's probably a expectation that you wouldn't incur that much context loading cost. But this suggests that's not at all true.
  
  5 replies →

ianberdin 1 day ago

No, you underestimate how huge the malware problem right now. People try publish fake download landing pages for shell scripts or even Claude code on https://playcode.io every day. They pay for google ads $$$ to be one the top 1 position. How Google ads allow this? They can’t verify every shell script.

No I am not joking. Every time you install something, there is a risk you clicked a wrong page with the absolute same design.

jeffrwells 1 day ago

He's not talking about malware awareness. He's talking about a bug i've seen too which requires Claude justifying for *every* tool call to add extra malware-awareness turns. Like every file read of the repo we've been working on
Schlagbohrer 7 hours ago

Also increasing numbers of attacks against Anna's Archive with fake cloned front end web GUIs leading to malware scripts.

sensanaty 19 hours ago

Their marketing is going overtime into selling the image that their models are capable of creating uber sophisticated malware, so every single thing they do from here on out is going to have this fear mongering built in.

Every statement they make, hell even the models themselves are going to be doing this theater of "Ooooh scary uber h4xx0r AI, you can only beat it if you use our Super Giga Pro 40x Plan!!". In a month or two they'll move onto some other thing as they always do.

dandaka 1 day ago

I have started to notice this malware paranoia in 4.6, Boris was surprised to hear that in comments, probably a bug

solenoid0937 15 hours ago

It was fixed for me by updating Claude and restarting
greenchair 1 day ago

more likely the paranoia behavior was backported. current gen is already being used for bug bounties.

cowlby 13 hours ago

I "fixed" this for myself with tweakcc which let's you patch the system prompts. I changed the malware part to just be "watch out for malware" and it's stopped being unaligned.

They really should hand off read() tool calls to a lean cybersecurity model to identify if it's malware (separately from the main context), then take appropriate action.

matheusmoreira 3 hours ago

The newest versions of the Claude Code package on npm just download the native executables and run that instead. Does tweakcc support that yet? Last time I tried it, there were some pretty huge error messages. For now I've been coping with a pinned version.

ricardobeat 1 day ago

Presumably because it has become extremely good at writing software, and if it succeeds at helping someone spread malware, especially one that could use Claude itself (via local user's plans) to self-modify and "stay alive", it would be nearly impossible to put back in the bottle.

lionkor 20 hours ago

That would put itself back in the bottle by running killall to fix a stuck task, or deleting all core logic and replacing it with a to-do to fix a test.