Comment by jkhdigital

6 years ago

This is exactly what I was thinking. It seems like he’s just taking advantage of the fact that CJK characters are a lot more information-dense than English letters, and has a turnkey encoder/decoder in the form of the pre-trained GPT-2 language model.

I am, in fact, currently involved in research that uses GPT-2 for format-transforming encryption and we follow the exact same recipe but in reverse. Encrypt a message, then "decompress" the random bits into English text using GPT-2. Assuming the receiver has the same GPT-2 state (model parameters and prompt) then they will reproduce the same random bits by arithmetic compression, which can then be decrypted with the shared key.

At some point, isn't this just Shannon coding that takes advantage of higher-order probabilities?