Comment by throwaway13337

2 days ago

Is there a reason why prompt injections in general are not solvable with task-specific layering?

Why can't the llm break up the tasks into smaller components. The higher level task llm context doesn't need to know what is beneath it in a freeform way - it can sanitize the return. This also has the side effect of limiting the context of the upper-level task management llm instance so they can stay focused.

I realize that the lower task could transmit to the higher task but they don't have to be written that way.

The argument against is that upper level llms not getting free form results could limit the llm but for a lot of tasks where security is important, it seems like it would be fine.

4 comments

throwaway13337

warkdarrior 2 days ago

So you have some hierarchy of LLMs. The first LLM that sees the prompt is vulnerable to prompt injection.

giancarlostoro 2 days ago
The first LLM only knows to delegate and cannot respond.
- maxfurman 2 days ago
  
  But it can be tricked into delegating incorrectly - for example, to the "allowed to use confidential information" agent instead of the "general purpose" agent
- rafabulsing 2 days ago
  
  It can still be injected to delegate in a different way than the user would expect/want it to.