Comment by abir_taheer
2 days ago
hi! we actually built a service to detect indirect prompt injections like this. I tested out the exact prompt used in this attack and we were able to successfully detect the indirect prompt injection.
Feel free to reach out if you're trying to build safeguards into your ai system!
centure.ai
POST - https://api.centure.ai/v1/prompt-injection/text
Response:
{ "is_safe": false, "categories": [ { "code": "data_exfiltration", "confidence": "high" }, { "code": "external_actions", "confidence": "high" } ], "request_id": "api_u_t6cmwj4811e4f16c4fc505dd6eeb3882f5908114eca9d159f5649f", "api_key_id": "f7c2d506-d703-47ca-9118-7d7b0b9bde60", "request_units": 2, "service_tier": "standard" }
No comments yet
Contribute on Hacker News ↗