Comment by aleksiy123
7 hours ago
Probably a mix of heuristics, keywords and simple ml model.
Then maybe a second gate with a lightweight llm?
Edit: actually Gcp, azure, and OpenAI all have paid apis that you can also use.
But I don’t think they go into details about the exact implementation https://redteams.ai/topics/defense-mitigation/guardrails-arc...
When we do these it's a fine-tuned classifier, generally a BERT class model. Works quite well when you sanitize input and output with low latency/cost.