Comment by jl6
2 months ago
It feels like this would only be feasible across longer passages of text, and some types of text may be less amenable to synonyms than others. For example, a tightly written mathematical proof versus a rambling essay. Biased token selection may be detectable in the latter (using a statistical test), and may cause the text to be irreparably broken in the former.
To handle low entropy text, the “adding a smaller constant to the logits” approach avoids having much chance of changing the parts that need to be exactly a particular thing,
Though in this case it needs longer texts to have high significance (and when the entropy is low, it needs to be especially long).
But for most text (with typical amounts of entropy per token) apparently it doesn’t need to be that long? Like 25 words I think I heard?