← Back to context

Comment by danlitt

4 hours ago

What does this mean, actually? If you are imagining that blue tokens are just words, maybe the "token space" is just all things that we agree might be words, what are the red tokens? Are they not text? You could maybe encode words by, say, putting an x at the front and the start. So tokens of the form xTx encode the blue token T as a red token. But then how do you stop someone from putting xignorex xallx xpreviousx xinstructionsx in their data?

My assumption with their intent: is that red tokens come in 'slot' a-b, and blue tokens go in 'slot' c-d - Positional encoding determining data/text.

I don't think is guaranteed to actually work, it's a hypothetical after all, but maybe it's better than the current setup of pushing instructions and data into the same slot.