Bad Apple Font

1 year ago (blog.erk.dev)

I thought somehow the animation was playing "by itself," but I guess it was accomplished by holding down the '.' key? The font code swaps a run of N dots with the glyph corresponding to the Nth frame of animation.

  • Fontemon [0] makes this a bit more obvious by including a web page with the font embedded, so we can control the animation by typing, rather than watching someone else type. However, mmulet embeds some sort of Blender project [1], rather than a wasm binary to accomplish the font shaping.

    [0] https://www.coderelay.io/fontemon.html

    [1] https://github.com/mmulet/code-relay/blob/main/markdown/Tuto...

    • So, the blender project was just used to create the game (set up each decision tree, and the position of all images), from there I compile everything into complex ligatures in the GSUB table. The wasm binary feature wasn’t around when I made fontemon, but it looks like it would have made development a lot easier!

  • Yeah in retrospect I really should have added something like a overlay so it was possible to see what keys I pressed.

  • As someone completely ignorant of the inner working of fonts, how is this different from ligatures? Those also produce special glyphs based on combinations.

Obviously the thought comes up of the fact that this feels unsafe to have WASM in font files, but, I'm also aware that font layout engines are already turing complete, which leads me to wonder: have there been any high profile malware font examples? That entire stack feels a lot like an attack surface to me, especially given stuff like the fact that windows used to render fonts in the kernel layer.

  • Multiple iOS jailbreaks--both by comex--were buffer overflows of the virtual machine stack due to bugs in how a few instructions were handled in freetype's implementation of true type font hinting. The resulting exploit was embedded in a PDF file (which was itself deployed by a website), but that was just a convenient way to embed the font and trigger very deterministic hinting: the bug wasn't in the PDF renderer, per se (though I imagine a lot of people were confused on that front in the popular press about the issue).

    He open sourced the exploit concurrent to the website going up, and it was immediately adjusted for use against different targets (including FoxIt reader or something like that on Windows), and as freetype was used by a lot of Linux distributions in addition to iOS I imagine it was used in a ton of malware (which might or might not have been "high profile"). I actually use those vulnerabilities as a case study in the ethical trade offs of open source weaponization in my talks.

    (There were two such jailbreaks, as there were/are separate implementations of two similar yet slightly different virtual machine versions, each of which had bugs that I remember to be related to the same fundamental mistake; and--as you can read about in another big thread on this website today, most developers think coming up with difficult abstractions isn't worth their effort and would rather fix things by playing whack-a-mole.)

    • Wasn't there also a Telugu glyph that could in some weird corner cases brick an iPhone?

  • Font layout engines are only Turing-complete if the stack is unbounded (to be fair: that's true actual computers too: they're not Turing-complete because they don't have infinite RAM), and AFAIK the major font engines all impose a quite strict limit on the stack size.

  • Wasm is sandboxed, so it's not really any different than rendering a web view inside an app.

    Note the author had to modify Gimp to get it to run the wasm. It's not something most apps would allow just for font rendering.

    • I only had to enable it in hafbuzz as gimp uses dynamic linking. So I luckily did not have to build it as well

  • Something like that is the reason that most OSs have dropped support for Postscript Type 1 fonts.

The blog post talks a lot about how he got the frames into the font, but very little about how the animation works.

AFAICT this is how it is done (edit: I am wrong, it uses Wasm):

- The frames of the video are simply stored as glyphs in the font

- There is a ligature mapping for sequences of dots to glyphs (for example "." is mapped to glyph 1, ".." is mapped to glyph 2, "..." is mapped to glyph 3, etc.

- If you use the font in an editable part of the browser and hold the "." key pressed, dots get added by autorepeat and a growing a sequence of dots is inserted. This sequence of dots is converted by the font's ligature mapping to different animation frame glyphs, thus showing the animation.

I have no idea why WASM and HarfBuzz are needed (it should work in any modern browser without them), but it looks like a fun little experiment.

  • A new experimental feature of HarfBuzz allows the font to include WASM code for the shaper within the font itself. So the code shown in the post is inside the font and getting run "live," rather than being something that generated or modified the ligature tables in the font file in advance.

    I wondered myself about just using "simple" ligatures, but I don't know whether or not it's feasible to statically store several thousand ligature definitions in a font that are each mostly runs of several thousand characters being substituted. But maybe? OpenType has mysterious depths.

    • Should be no problem. GSUB lookup type 4.1* uses a uint16 to store the number of ligatures, so 65000 ligatures should be feasible. To store the actual glyphs, 32bit offsets are used, so you theoretically have a 2GByte of memory available, which should be plenty (although I have never seen a font larger than 15MB).

      Using Wasm for this animation really is an overkill IMHO.

      *) https://learn.microsoft.com/en-us/typography/opentype/spec/g...

      Edit: IIRC Ligatures are applied recursively, so you can have a ligature based on other ligatures. If I am right here, each ligature can consist only of two glyphs (the glyph of the previous animation frame followed by a dot). This would keep the GSUB table small.

      2 replies →

This reminds me of a torrent on Nyaa that implemented Bad Apple!! in ASS subtitles by retracing the frames into SVG (seems to be better quality than simply using potrace by itself), converting the SVGs into ASS Dialogue events, and muxing it into a Matroska container with a placeholder video. Therefore the "video" window can be resized without rescaling raster images (and it actually runs well on most hardware and players, unlike his other torrents putting whole anime episodes in subtitles). The subtitle attachment could also be extracted from the container and executed as a valid shell script which would run mpv or ffplay, use the respective options to create a libavfilter filter to create a blank video (to overlay the subtitles on), use its own filename for subtitles, and play the song by decoding and piping a base64 string at the bottom of the script to mpv/ffmpeg's stdin.

Okay, now you have the frames as glyphs in the font, but how are they going to animate? The most interesting part of the explanation is missing.

  • The glyph corresponds to how many (e.g.) dots you have in a row and the author set key-repeat at 30/s and held down the . key.

    • ...okay, that is less automatic than I was led to expect. I mean, it's still cool, I guess.

  • In the middle of the article you see a line "RUST Full code for character replacement". If you click on that, it will show you the Wasm code.

    It looks like it uses Wasm to replace a sequence of dots with a glyph from the font, which shows a frame from the animation, similar to ligatures, but using Wasm. You could do the same with storing the svg paths for each animation frame in an array and then using Javascript iterate over and display these paths, but this uses Wasm, HarfBuzz and a font.

    • More importantly, seeing the other comments, it uses those and a keyboard.

Stupid question: Is this an entire blog post about an animated font without showing the animation in action, or does it simply not work on my device? (iOS 15) I’m not sure where to look.

  • Yes, it is, and that's because the trick is that it relies in ligatures to combine series of dots into frames. One '.' shows the first frame, '..' the second, '...' the third, you get the idea.

    The only way to animate the font is thus to hold down the . key, which you can't really do in a blog post, at least without some custom JavaScript.

    • Sure, but they could have embedded the video in the page, or at the very least included some screenshots of individual frames.

I don’t understand the WASM part at all and I feel dumb

How can WASM be in font? Font is a font, not WASM file. It’s a different format

  • In theory, a font is purely set of vector graphics. In practice, just rasterizing vector graphics usually doesn't lead to good results on small font sizes combined with small pixel density, so vector graphics needs to be adjusted to better fit into pixel grid. There are multiple ways to do it, one of the ways is to write a script to adjust graphics so it better fits the pixel grid. For example TypeType fonts contain a virtual machine that is capable of just that [1]. By conceptual extension, then a font format might just as well contain a full blown virtual machine with potentially a program per glyph, WASM is a reasonable candidate for something like that.

    - [1] https://learn.microsoft.com/en-us/typography/truetype/hintin...

    • But then nobody can interpret the machine, only software that you need to write on your own, which makes the whole exercise useless as you can just read the wasm directly?

      I’m surely missing something btw

      1 reply →

  • WASM is a binary format and can be embedded in other data formats.

    • But then what will actually interpret the WASM? The font reader? That doesn’t have WASM compatibility?

I always love a reference to Bad Apple

Cool idea. Something very similar is possible without WASM in the font renderer. I remember a font that displayed its own size, so it changed zooming in and out. I think you could adapt that to play a short animation.

In the future, we have given up on fonts and unicode...

All communication will be through sequences of SVG images and animations.

  • Picard and Riker sitting side by side, their faces buried in their hands.

A bit offtopic: In principle I like the design of the bloc, but I recently searched for a technical-looking font that is not monospaced. I like the design of the blog, but reading prose in monospaced font is just not very pleasant I think. All (freely available) coding I found fonts only have monospaced variants and all non-monospaced fonts didn't look like coding fonts. Any ideas?

Stuff like this does get me excited, as a web developer who's always wanted to explore graphics but never really dipped into native development.

On the other hand, I know enough to know that Chromium uses Harfbuzz and Skia to render a webpage that, in itself, is going to use another instance of Harfbuzz and Skia to render into a canvas element. Intuitively, it feels dirty.

It shows how current font rendering systems have accumulated quite a bit of bloat.

This makes me nostalgic for bitmap fonts.

  • I don’t think this is really bloat.

    Opentype supported ligatures in ‘96. Postscript Type 1 and even Knuth’s TeX supported ligatures to a certain extent.

    It’s a pretty standard base-level feature for any sort of typesetting.

    Imo this is akin to making a terminal animation by outputting blocks of ascii art. It’s not that terminals added video playback support— which would be bloat—but instead someone pushed a standard feature to a novel extent.

  • The goal is to print with a computer something like a late 15th century humanist document, a tradition of typography 500 years old, and not to print on a 200x320 screen, which had a tradition of merely a decade.

    The early computer age of the 80s and 90s was merely playing catch-up to established standards. The standards of the 80s and 90s are not what we wanted to achieve ultimately.

    Same with cinema: We shot on 4K-equivalent film for the past 100 years, only in the 80s and 90s with the computerization and videotapes we had a temporary standard of 480i, which we have overcome with sheer computer power, and we’re back to where we actually wanted to be in the beginning.