Comment by andai

7 hours ago

>it tends to leave big, dangerous holes hiding inside implementations unless babied.

A brainwave: perhaps GLM or DeepSeek could be integrated into the mix for the purposes of red-teaming the code. Fable has been blinded to security by design[0], and the open models are pretty decent at it.

[0] It's not clear what the situation with GPT-5.6 will be but the blog suggests similarly over-cautious safety filters.

Amusingly the posts for recent Opus releases brag that they successfully made it worse at security! "during its [Opus 4.7] training we experimented with efforts to differentially reduce these ["cyber"] capabilities"

1 comment

andai

tekacs 5 hours ago

I definitely use GPT-5.5 as a counterpart to validate these exact sorts of things in Anthropic models' implementations, in the (now-rarer) cases where I allow Anthropic's models _to_ implement.

And yeah, it's a bit depressing to think that 5.6 might be similarly nerfed. Less secure software for us all, I guess... except BigCorps. :(