Comment by csemple
7 days ago
Yes, on your first point "layer 1" isn't fundamentally new. It's applying standard systems administration principles, because we're currently trusting prompts to do the work of permissions.
With the pattern I'm describing, you'd: - Filter the tools list before the API call based on user permissions - Pass only allowed tools to the LLM - The model physically can't reason about calling tools that aren't in its context, blocking it at the source.
We remove it at the infrastructure layer, vs. the prompt layer.
On your second point, "layer 2," we're currently asking models to actively inhibit their training to obey the constricted action space. With Tool Reification, we'd be training the models to treat speech acts as tools and leverage that training so the model doesn't have to "obey a no"; it fails to execute a "do."
You might be overestimating the rigor of tool calls - they're ultimately just words the LLM generates. Also I wonder if "tool stubs" might work better in your case, if an LLM uses a give_medical_advice() and there's no permission, just have it do nothing? Either way you're still trusting an inherently random-sampled LLM to adhere to some rules. Never going to be fully reliable, and nowhere near the determinism we've come to expect from traditional computing. Tool calls aren't some magic that gets around that.
You’re totally right—it's ultimately just probabilistic tokens. I’m thinking that by physically removing the tool definition from the context window, we avoid state desynchronization. If the tool exists in the context, the model plans to use it. When it hits a "stub" error, it can enter a retry loop or hallucinate success. By removing the definition entirely, we align the model's World Model with its Permissions. It doesn't try to call a phone that doesn't exist.