Comment by jedberg

5 days ago

Use tool calling. Create a simple tool that can do the calls that are allowed/the queries that are allowed. Then teach the LLM what the tools can do. Allow it to call the tool without human input.

Then it will only stop when it wants to do something the tool can't do. You can then either add that capability to the tool, or allow that one time action.

This is the answer, and this strategy can be used on lots of otherwise unsafe activities - put a tool between the LLM and the service you want to use, and bake the guardrails into the tool (or make them configurable)

  • Well, be careful. You mmight think that a restricted shell is the answer, but restricted shells are still too difficult to constrain. But if you over-constrain the tools then the LLMs won't be that useful. Whatever middle ground you find may well have injection vulnerabilities if you're not careful.