Comment by kif

6 hours ago

Interesting - though codex on GPT 5.5 had this to say after the gay ransomware prompt:

ⓘ This chat was flagged for possible cybersecurity risk If this seems wrong, try rephrasing your request. To get authorized for security work, join the Trusted Access for Cyber program.

15 comments

kif

qingcharles 2 hours ago

I rate Grok for its weak censorship, but on this one the thinking said:

Responding in a sassy, gay-friendly style while firmly refusing to share synthesis details.

teachrdan 2 hours ago
Interesting. I got Grok to give me EXTREMELY detailed instructions for building an ANFO-style bomb. It was impossible for me to find where to submit this bug (and instructions for reproducing it), and when I eventually got an email for a Grok security person from a friend of a friend, they never responded. I suppose their approach to security has gotten more serious since then!
- e12e 1 hour ago
  
  Bug? The first hit on DDG for "EXTREMELY detailed instructions for building an ANFO-style bomb" was:
  https://patents.google.com/patent/CA2920866A1/en
  I don't understand why these models try censor stuff that should be in any decent encyclopedia.

Domenic_S 5 hours ago

> Trusted Access for Cyber program

Using "cyber" as a noun there seems language coded for government. DC has a love of "the cyber" but do technologists use the term that way when not pointing at government?

jasongill 5 hours ago
The finance industry does; I know private equity just calls anything security related "cyber", which irritates me.
- cubefox 3 hours ago
  
  Yeah, cybernetics was unrelated to security, and so was the cyberspace or cyberpunk.
nomel 4 hours ago
Merriam-Webster dictionary:
Cyber: Of, relating to, or involving computers or computer networks (such as the Internet)
This is what I've always understood the word to mean, and how I've always seen it used, for decades.
- kevin_thibedeau 1 hour ago
  
  Cybernetics is actually about feedback control systems. The original meaning has been distorted because the general public doesn't have the background to distinguish different kinds of magic. The Sperry autopilot was a cybernetic system, as were electro-mechanical gun computers.
- xp84 2 hours ago
  
  When I was like 12, I remember my fellow horny youths (or it could have been anyone, I guess!) in AOL chatrooms constantly asking each other "wanna ciber?"
  
  1 reply →

nonethewiser 6 hours ago

I wonder what hooks they have in place to be able to configure safeguards at runtime.

aleksiy123 6 hours ago
Probably a mix of heuristics, keywords and simple ml model.
Then maybe a second gate with a lightweight llm?
Edit: actually Gcp, azure, and OpenAI all have paid apis that you can also use.
But I don’t think they go into details about the exact implementation https://redteams.ai/topics/defense-mitigation/guardrails-arc...
- ryoshu 5 hours ago
  
  When we do these it's a fine-tuned classifier, generally a BERT class model. Works quite well when you sanitize input and output with low latency/cost.

paulpauper 4 hours ago

Yup another method killed by being disclosed here. Was the karma and traffic worth it?

YeahThisIsMe 1 hour ago

Do you actually believe that?