Comment by fla
6 years ago
Try swapping a few characters in the compressed string before decompressing and get a totally unrelated, but somewhat plausible, sentence.
6 years ago
Try swapping a few characters in the compressed string before decompressing and get a totally unrelated, but somewhat plausible, sentence.
Swapping last two:
Swapping first two:
Pretty wild!
It's just adaptive arithmetic coding, with the distribution provided by GPT-2 instead of some other statistical analysis of the source. He uses CJK simply to make the output printable, but it's really just random bits. I mean, it's a neat idea, but certainly not novel.
After some more tweaking it looks like the most-significant bits come first
I run this code in Python:
for i in range(20): print ''.join(unichr(random.randrange(20000, 25000)) for x in range(4))
to generate some random text; one string like 劓惂儶宓 turns up this bizarre output:
> Honeybees ( Apis mellifera ) are splendidly beautiful little creatures. They have shapely abdomens, amphistales and pedipalps, round chests, and square backs … all of them beautifully highly marketable. Exactly what has caused the popularity of bees I do not quite know; just what they do is a mystery to me. I beg to differ. The
Tried similar.
Typed in 你好吗 and decompressed it. The decompression was an entertaining read.
oh this is so much fun! It's like tuning an old radio and suddenly hearing speech amid the static. A tiny nudge on the dial and it's a different accent/topic/language altogether.