← Back to context

Comment by jrootabega

2 months ago

Here's a POC that works in emacs. Doesn't cover all of the relevant characters, but:

  (setq   ;;some other invisible or interesting characters
          unicode-zero-width-space ?\u200b
          unicode-zero-width-non-joiner ?\u200c
          unicode-zero-width-joiner ?\u200d
          unicode-zero-width-nbsp ?\ufeff
          unicode-narrow-nbsp ?\u202f
          unicode-word-joiner ?\u2060
          unicode-grapheme-joiner ?\u034f
          unicode-no-break-space ?\u00a0
          unicode-combining-long-stroke ?\u0336
          ;;variation selector examples
          unicode-vs-fe00 ?\ufe00
          unicode-vs-fe0f ?\ufe0f
          unicode-vs-e0100 ?\xe0100)


    (defun show-glyphless-as-hex (char)
      (let ((original (elt glyphless-char-display char)))
        (aset glyphless-char-display char 'hex-code)
        original)) ;;so you can see what you just replaced


    (progn
      (show-glyphless-as-hex unicode-zero-width-space)
      (show-glyphless-as-hex unicode-zero-width-non-joiner)
      (show-glyphless-as-hex unicode-zero-width-joiner)
      (show-glyphless-as-hex unicode-zero-width-nbsp)
      (show-glyphless-as-hex unicode-word-joiner)
      (show-glyphless-as-hex unicode-grapheme-joiner)
      (show-glyphless-as-hex unicode-narrow-nbsp)
      (show-glyphless-as-hex unicode-no-break-space)
      ;;these may already be visible if the current conditions don't support them
      ;;but we'll force them
      (show-glyphless-as-hex unicode-vs-fe00)
      (show-glyphless-as-hex unicode-vs-fe0f)
      (show-glyphless-as-hex unicode-vs-e0100))

And as a higher-level configuration you can set most, maybe even all, of the relevant invisible characters (still not sure how 0x34f grapheme joiner fits in) at once with something like:

  (custom-set-variables
   '(glyphless-char-display-control  '((format-control . hex-code)
                                       (variation-selectors . hex-code))))

This will modify values in glyphless-char-display, but it's OK to modify those directly if you need to.

Here is the bare minimum this is built on, which you can type in yourself if you're paranoid or want to start from the bottom up. Swap in the hexadecimal codepoint of the invisible character after the ?\x

  (aset glyphless-char-display ?\xfe00 'hex-code)

I use vim. It seems like `:set binary enc=latin1` works, though I don't understand why the latin1 part is required.