Comment by wackget

1 month ago

I don't want to trivialise someone's hard work but isn't this really just applying to LLMs what every responsible developer/sysadmin already knows: granular permissions, thoughtfully delegated?

You wouldn't give every user write access to a database in any system. I'm not sure why LLMs are a special case. Is it because people have been "trusting" the LLMs to self-enforce via prompt rules instead of actually setting up granular permissions for the LLM agent process? If so, that's a user training issue and I'm not sure it needs an LLM-specific article.

Secondly, FTA:

> You can stop a database delete with tool filtering, but how do you stop an AI from giving bad advice in text? By using a pattern I call “reifying speech acts into tools.” > The Rule: “You may discuss symptoms, but you are forbidden from issuing a diagnosis in text. You MUST use the provide_diagnosis tool.” > The Interlock: > If User = Doctor: The tool exists. Diagnosis is possible. > If User = Patient: The tool is physically removed. > When the tool is gone, the model cannot “hallucinate” a diagnosis because it lacks the “form” to reason and write it on.

How is this any different from what I described above as trusting LLMs to self-enforce? You're not physically removing anything because the LLM can still respond with text. You're just trusting the LLM to obey what you've written. I know the next paragraph admits this, but I don't understand why it's presented like a new idea when it's not.

3 comments

wackget

csemple 1 month ago

Yes, on your first point "layer 1" isn't fundamentally new. It's applying standard systems administration principles, because we're currently trusting prompts to do the work of permissions.

With the pattern I'm describing, you'd: - Filter the tools list before the API call based on user permissions - Pass only allowed tools to the LLM - The model physically can't reason about calling tools that aren't in its context, blocking it at the source.

We remove it at the infrastructure layer, vs. the prompt layer.

On your second point, "layer 2," we're currently asking models to actively inhibit their training to obey the constricted action space. With Tool Reification, we'd be training the models to treat speech acts as tools and leverage that training so the model doesn't have to "obey a no"; it fails to execute a "do."

QuadrupleA 1 month ago
You might be overestimating the rigor of tool calls - they're ultimately just words the LLM generates. Also I wonder if "tool stubs" might work better in your case, if an LLM uses a give_medical_advice() and there's no permission, just have it do nothing? Either way you're still trusting an inherently random-sampled LLM to adhere to some rules. Never going to be fully reliable, and nowhere near the determinism we've come to expect from traditional computing. Tool calls aren't some magic that gets around that.
- csemple 1 month ago
  
  You’re totally right—it's ultimately just probabilistic tokens. I’m thinking that by physically removing the tool definition from the context window, we avoid state desynchronization. If the tool exists in the context, the model plans to use it. When it hits a "stub" error, it can enter a retry loop or hallucinate success. By removing the definition entirely, we align the model's World Model with its Permissions. It doesn't try to call a phone that doesn't exist.