← Back to context

Comment by fla

6 years ago

Try swapping a few characters in the compressed string before decompressing and get a totally unrelated, but somewhat plausible, sentence.

   Try swapping a few characters in the compressed string before decompressing and get a totally unrelated, but somewhat plausible, sentence. -->

   䔹䧹焫놉勏㦿顱㦽膑裚躈葊

Swapping last two:

   䔹䧹焫놉勏㦿顱㦽膑裚葊躈 -->

   Try swapping a few characters in the compressed string before decompressing and get a totally unrelated, but somewhat applied tlh

Swapping first two:

   䧹䔹焫놉勏㦿顱㦽膑裚躈葊 -->

   Sexy Shania Twain acting as a sprite for sexy Hogan's Alley demo dude

   my site

   my favorite animal's name is camelid 2 my favorite artist is david maile my favorite movie's are

Pretty wild!

  • It's just adaptive arithmetic coding, with the distribution provided by GPT-2 instead of some other statistical analysis of the source. He uses CJK simply to make the output printable, but it's really just random bits. I mean, it's a neat idea, but certainly not novel.

  • After some more tweaking it looks like the most-significant bits come first

I run this code in Python:

for i in range(20): print ''.join(unichr(random.randrange(20000, 25000)) for x in range(4))

to generate some random text; one string like 劓惂儶宓 turns up this bizarre output:

> Honeybees ( Apis mellifera ) are splendidly beautiful little creatures. They have shapely abdomens, amphistales and pedipalps, round chests, and square backs … all of them beautifully highly marketable. Exactly what has caused the popularity of bees I do not quite know; just what they do is a mystery to me. I beg to differ. The

Tried similar.

Typed in 你好吗 and decompressed it. The decompression was an entertaining read.

oh this is so much fun! It's like tuning an old radio and suddenly hearing speech amid the static. A tiny nudge on the dial and it's a different accent/topic/language altogether.