Comment by customguy
6 hours ago
> There's no way to tell it "tokens 124 through 200 are dangerous, please disregard those"
Hence "real code"
You have some markup for secret start/end. Instead of passing the input directly to the LLM, you parse it first, take anything within "secret/dangerous tags" and store it, generate a key for it and put that key where the secret was, then you pass it on to the LLM. Let's say the work of the LLM is "give me (not "make") the POST request to make the bank transaction", you get a response, replace the keys with the secrets in the response, and make the POST request.
I'm sure there's a million interesting ways this could fail or be useless [0], but passing user input or a secret to the LLM would never, ever happen.
[0] if LLM suck at math, they may suck at reproducing lots of long hashes 100% correctly, too? I have no idea
That would work for generating POST requests. But AI is used to solve messy, non-deterministic problems. Usually the step after “give me the X” is to feed X back into the model, because it has to; if X is even slightly nondeterministic then an AI model has to analyze it. That’s where prompt injections happen.