Comment by roxolotl

6 hours ago

What does this mean?

> It's a different kind of tool doing a different kind of work, and that makes a clean apples-to-apples comparison to earlier models difficult.

They claim it’s a different kind of tool and then describe using it the same way you’d use any other model. This really felt way worse than the average Cloudflare blog and really just rehashed the Mythos announcement which had already called out the key parts being chaining and crafting examples.

16 comments

roxolotl

JeremyNT 3 hours ago

> They claim it’s a different kind of tool and then describe using it the same way you’d use any other model. This really felt way worse than the average Cloudflare blog and really just rehashed the Mythos announcement which had already called out the key parts being chaining and crafting examples.

Hah, I was trying to parse this too.

Charitably perhaps they're being vague on exactly what's different because they're still under NDA.

password4321 3 hours ago

> way worse than the average Cloudflare blog

How long has it been since you took your average? Lately all Cloudflare output has been heavily AI'd.

meander_water 1 hour ago

> the model has its own emergent guardrails that sometimes cause it to push back on legitimate security research requests. But as we found, these organic refusals aren’t consistent - the same task, framed differently or presented in a different context, could produce completely different outcomes as illustrated in the examples below.

This was new. I'm surprised that a model specifically designed for security research and gated to professionals is refusing legitimate requests

_alternator_ 6 minutes ago

There's pretty strong evidence that (mis)alignment in one area creates (mis)alignment in others. The "aligned behavior" vectors are not orthogonal from cybersecurity to bioweapons to prejudice, so having alignment in some will likely bleed into others.

__natty__ 6 hours ago

Sounds different because it’s hidden advertisement not a regular blog post

grim_io 5 hours ago
But why would cloudflare advertise Anthropic? They are competing with Anthropic by hosting open weights models.
- Someone1234 5 hours ago
  
  https://www.cloudflare.com/press/press-releases/2025/cloudfl...
  
  1 reply →

samstokes 1 hour ago

The post says they wrote a custom harness that orchestrates work between multiple separate model invocations. That is different from running Claude Code (which is a specific existing harness around the Claude models).

The post takes a while to get around to saying that, and could have included more detail besides the workflow diagram and table (which they flag as only "an example of" such a harness), but it does answer the question. It's a different kind of tool because it's a model rather than a harness+model pair.

smusamashah 3 hours ago

'Its not X, its Y' is also a common LLM trope.

eikenberry 4 hours ago

My guess is because it is a model trained specifically for security/hacking. So comparing it to Opus, trained for chat/code/etc., is apples-to-oranges.

rs_rs_rs_rs_rs 3 hours ago

It is not, that's what surprised Anthropic employees too.

FergusArgyll 5 hours ago

I think what they might mean is:

Because of it's capabilities, a new kind of harness can be built for it, thus the entire system (model + harness) is a different kind of tool than say Claude code

Xirdus 5 hours ago
But did they build this different harness? And are they sure other models can't cope with it?
- roxolotl 5 hours ago
  
  Right I expected the piece to transition into “and here’s how we built a whole new thing for it” but it never did.
  
  1 reply →