← Back to context

Comment by throwaway13337

2 days ago

Is there a reason why prompt injections in general are not solvable with task-specific layering?

Why can't the llm break up the tasks into smaller components. The higher level task llm context doesn't need to know what is beneath it in a freeform way - it can sanitize the return. This also has the side effect of limiting the context of the upper-level task management llm instance so they can stay focused.

I realize that the lower task could transmit to the higher task but they don't have to be written that way.

The argument against is that upper level llms not getting free form results could limit the llm but for a lot of tasks where security is important, it seems like it would be fine.

So you have some hierarchy of LLMs. The first LLM that sees the prompt is vulnerable to prompt injection.

  • The first LLM only knows to delegate and cannot respond.

    • But it can be tricked into delegating incorrectly - for example, to the "allowed to use confidential information" agent instead of the "general purpose" agent

    • It can still be injected to delegate in a different way than the user would expect/want it to.