Comment by abir_taheer

2 days ago

hi! we actually built a service to detect indirect prompt injections like this. I tested out the exact prompt used in this attack and we were able to successfully detect the indirect prompt injection.

Feel free to reach out if you're trying to build safeguards into your ai system!

centure.ai

POST - https://api.centure.ai/v1/prompt-injection/text

Response:

{ "is_safe": false, "categories": [ { "code": "data_exfiltration", "confidence": "high" }, { "code": "external_actions", "confidence": "high" } ], "request_id": "api_u_t6cmwj4811e4f16c4fc505dd6eeb3882f5908114eca9d159f5649f", "api_key_id": "f7c2d506-d703-47ca-9118-7d7b0b9bde60", "request_units": 2, "service_tier": "standard" }

0 comments

abir_taheer

No comments yet

Contribute on Hacker News ↗