Comment by D-Machine
14 days ago
We CAN categorically say that no such token or cluster of tokens exists, because we know how LLMs and tokenizers work.
Current LLM implementations cannot delete output text, i.e. they cannot remove text from their context window. The recursive application is such that outputs are always expanding what is in the window, so there is no backtracking like humans can do, i.e. "this text was bad, ignore it and remove from context". That's part of why we got crazy loops / spirals like we did with the "show me the seahorse emoji" prompts.
Backtracking needs more than just a special token or cluster of tokens, but also for the LLM behaviour to be modified when it sees that token or token cluster. This must be manually coded in, it cannot be learned.
without claiming this is actually happening, it is certainly possible to synthetically create a token that ablates the values retrieved from queries in a certain relative time range (via transformations induced by e.g. RoPE encoding)