Llama.ttf: A font which is also an LLM

7 months ago (fuglede.github.io)

After watching part of the video, I believe the world would benefit from a weekly television program where you could tune in each week to watch something weird, brilliant and funny. This would be a great episode #1 for that television show.

This is cool, as far as a practical issue though (aside from the 280gb TTF file!) is that it makes it incompatible with all other fonts; if you copy and paste your "improved" text then it will no longer say what you thought it did. It just alters the presentation, not the content. I guess you would have to ocr to get the content as you see it.

I was wondering why this was never used for an simpler autocorrect, but i guess that's why.

Also perhaps someone more educated on LLMs could tell me; this wouldn't always be consistent right? Like "once upon a time _____" wouldn't always output the same thing, yes? If so even copying and pasting in your own system using the correct font could change the content.

  • > if you copy and paste your "improved" text then it will no longer say what you thought it did

    It's not a bug, it's a feature - a DRM. Your content can now be consumed, but cannot be copied or modified - all without external tools, as long as you embed that TTF somehow.

    Which kind of reminds me of a PDF invoices I got from my electricity provider. It looked and printed perfectly fine, but used weird codepoint mapping which resulted in complete garbage when trying to copy any text from it. Fun times, especially when pasting account number to a banking app.

    • This is while pretty much all software that extracts structured data from PDFs throws away the text and just OCRs the page. Too many tricks with layouts and fonts.

      1 reply →

  • The small model/TTF is only 60MB.

    The 280GB you saw is the Llama3-70B model which is basically chatgpt level (if not better).

  • If there's any randomness involved in inference, it ought to be deterministic as long as the same seed is used each time.

    • Is there even any possibility of using a different seed? I'd doubt the WASM shaper has accesss to any source of non-determinism.

  • > this wouldn't always be consistent right? Like "once upon a time _____" wouldn't always output the same thing, yes?

    Would be cool if you could turn up/down the LLM’s temperature by pressing different keys other than just !!!!

    Say pressing keyword numbers 0-9

While cool, technically… From a security perspective today I learned that TrueType fonts have arbitrary code execution as a ‘feature’ which seems mostly horrific.

  • (Sadly) this is nothing new. Years ago I wrangled a (modified) bug in the font rendering of Firefox [1, 2016] into an exploit (for a research paper). Short version: the Graphite2 font rendering engine in FF had/has? a stack machine that can be used to execute simple programs during font rendering. It sounded insane to me back then, but I dug into it a bit. Turns out while rendering Roman based scripts is relatively straightforward [2], there are scripts that need heavy use of ligatures etc. to reproduce correctly [3]. Using a basic scripting (heh) engine for that does make some sense.

    Whether this is good or bad, I have no opinion on. It is "just" another layer of complexity and attack surface at this point. We have programmable shaders, rowhammer, speculative execution bugs, data timing side channels, kernel level BPF scripting, prompt injection and much more. Throwing WASM based font rendering into the mix is just balancing more on top of the pile. After some years in the IT security area, I think there are so many easier ways to compromise systems than these arcane approaches. Grab the data you need from a public AWS bucket or social engineer your access, far easier and cheaper.

    For what it's worth, I think embedded WASM is a better idea than rolling your own eco systems for scripting capabilities.

    [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1248876

    [2] I know, there are so many edge cases. I put this in the same do not touch bucket as time and names.

    [3] https://scripts.sil.org/cms/scripts/page.php?id=cmplxrndexam...

  • If you think that's bad, until very recently, Windows used to parse ttf directly in the kernel, meaning that a target could look at a webpage, or read an email, and be executing arbitrary code in ring0.

    Last I checked there were about 4-10 TTF bugs discovered and actively exploited per year. I think I heard those stats in 2018 or so. This has been a well known and very commonly exploited attack vector for at least 20 years.

  • It's technically not arbitrary. There is a stack, of sorts, but IIRC it has a depth of six or so, by default. You can do cool stuff with font shaping, but you can't easily execute arbitrary code.

  • Not really, no more so than a random webpage running js/WASM in a sandbox.

    The only output from the WASM is to draw to screen. There is no chance of a RCE, or data exfiltration.

    • The risk is that you could have the text content say one thing while the visual display says another. There are social engineering and phishing risks.

      3 replies →

    • > Not really, no more so than a random webpage running js/WASM in a sandbox.

      ... except that it can happen in non-browser contexts.

      Even for browsers, it took 20+ years to arrive at a combination of ugly hacks and standard practices where developers who make no mistakes in following a million arcane rules can mostly avoid the massive day-one security problems caused by JavaScript (and its interaction with other misfeatures like cookies and various cross-site nonsense). During all of which time the "Web platform" types were beavering away giving it more access to more things.

      The Worldwide Web technology stack is a pile of ill-thought-out disasters (or, for early, core architectural decisions, not-thought-out-at-all disasters), all vaguely contained with horrendous hackery. This adds to the pile.

      > The only output from the WASM is to draw to screen.

      Which can be used to deceive the user in all kinds of well-understood ways.

      > There is no chance of a RCE, or data exfiltration.

      Assuming there are no bugs in the giant mass of code that a font can now exercise.

      I used to write software security standards for a living. Finding out that you could embed WASM in fonts would have created maybe two weeks of work for me, figuring out the implications and deciding what, if anything, could be done about them. Based on, I don't know, a hundred similar cases, I believe I probably would have found some practical issues. I might or might not have been able to come up with any protections that the people writing code downstream of me could (a) understand and (b) feasibly implement.

      Assuming I'd found any requirements-worthy response, it probably would have meant much, much more work than that for the people who at least theoretically had to implement it, and for the people who had to check their compliance. At one company.

      So somebody can make their kerning pretty in some obscure corner case.

    • It's still horrible, not in a (direct) security but in an interop sense: Now you have to embed an entire WASM engine, including proper sandboxing, just to render the font correctly. That's a huge increase of complexity and attack surface.

      7 replies →

    • I’m open to your idea, but can you explain in technical terms why a wasm sandbox is invulnerable to the possibility of escape vulnerabilities when other flavors of sandboxes have not been?

> The font shaping engine Harfbuzz, used in applications such as Firefox and Chrome, comes with a Wasm shaper allowing arbitrary code to be used to "shape" text.

Has there already been a proposal to add scripting functionality to Unicode itself? Seems to me we're not very far from that anymore...

  • Considering the actual complexity of rendering e.g. Urdu in decent, native-looking way you presumably do want some Turing-complete capabilities at least in some cases, cf "One handwritten Urdu newspaper, The Musalman, is still published daily in Chennai.[232] InPage, a widely used desktop publishing tool for Urdu, has over 20,000 ligatures in its Nastaʿliq computer fonts." (https://en.wikipedia.org/wiki/Urdu#Writing_system)

    Edit—the OP uses this exact use case, Urdu typesetting, to justify WASM in Harfbuzz (video around 6:00); seems like Urdu has really become the posterchild for typographic complexity these days

  • You mean encoding executable code in plain text files, that execute when you open them? No, that seems unnecessary and very insecure.

My takeaway is that if you can efficiently simulate rendering raster graphics with text ligatures, you could run Doom in a TTF.

Right?

> The font shaping engine Harfbuzz, used in applications such as Firefox and Chrome, comes with a Wasm shaper allowing arbitrary code to be used to "shape" text.

In that case could you ship a live demo of this that's a web page with the font embedded in the page as a web font, such that Chrome and Firefox users can try it out without installing anything else?

>build Harfbuzz with -Dwasm=enabled and build wasm-micro-runtime, then add the resulting shared libraries, libharfbuzz.so.0.60811.0 and libiwasm.so to the LD_PRELOAD environment variable before running a Harfbuzz-based application such as gedit or GIMP

It'd be lovely if someone embedded the font in a website form to save us all the trouble of demoing it

  • It would not be of much use as no browser enables this experimental feature. So unless you somehow build a wasm build of Harfbuzz with the feature enabled and embed it on there nothing will happen.

> The font shaping engine Harfbuzz, used in applications such as Firefox and Chrome, comes with a Wasm shaper allowing arbitrary code to be used to "shape" text.

Oh, this can't be used for nefarious purposes. What could POSSIBLY go wrong?!

Well this definitely won't get exploited at all or lead to new strict limits on what Harfbuzz/WASM can do

  • WASM sandboxing is pretty good! Together with the presumably very limited API with which this can communicate with the outside world, I wouldn't be too concerned.

    To me, it's a great reminder that the line between well-sandboxed turing-complete execution environments and messy implementations of decoders for "purely declarative" data formats can be quite blurry.

    Said differently, I'd probably trust Harfbuzz/WASM more than the average obscure codec implementation in ffmpeg.

    • Is there scientific proof of above claim such as "WASM sandboxing is pretty good!" ?

      At least most if not all ffmpeg decoders and demuxers are fuzzed all the time and any found issue is addressed.

      1 reply →

Does this mean fonts are Turing complete nowadays? Sounds like a pretty bad idea for security.

  • TrueType fonts have had a Turing complete virtual machine (almost?) since the beginning. It is used for "hinting" to allow partially colored pixels at low resolutions to remain legible. It's basically a program that decides whether to color a pixel or not to allow fine tuning of low resolution rasterization.

    This isn't used as much today with modern large resolutions where we can get decent image quality from just rasterizing the font outline with anti aliasing.

    This example, however, is using wasm embedded to ttf fonts which is not the same as ttf hinting byte code.

    • > TrueType fonts have had a Turing complete virtual machine (almost?) since the beginning. It is used for "hinting" to allow partially colored pixels at low resolutions to remain legible. It's basically a program that decides whether to color a pixel or not to allow fine tuning of low resolution rasterization.

      That sounds like an awful idea, too. I think a font file should describe the fonts form, but it should not describe how it is gonna be rendered. That should be up to the render engine of the device that is going to display the font (printer driver, monitor driver...). But I guess this idea is from a time when people were still using bitmap fonts.

      1 reply →

  • Apparently the font can only embed WASM, which is sandboxed so it can't do anything except turning a buffer of codepoints into glyphs and positioning them.

    Of course, back in the 1990s Java and Flash were supposed to be sandboxed. So who knows?

WebAssembly in fonts doesn't sound very secure, coming from someone who is certified in cybersecurity and has spent years doing font stuff.

  • Yes, that's the general consensus in the comments. It doesn't even sound safe to me and I'm not a full security pro. But OP did it as a PoC/for fun. It's okay to have fun still.

I never imagined a future in which PDFs talked back. Now I can.

  • PostScript files are dynamic code. You can create polygons dynamically with commands. And, of course, font FX's, styles, elipses...

    Also, there's a ZMachine interpreter (text adventure player) written in PostScript which can play Zork and some libre games such as Calypso with just GhostScript, the PostScript interpreter most software use to render PostScript files.

The author categorizes this as "pointless" but some things I can think of is being able to create automated workflows within an app that didn't previously allow it or had limited scope and then creating app interoperability with other app's using the same method.

  • You mean via wasm hinting in general or embedded llm in specific? Because I don’t see why you need an llm for that.

This is really cool, but I'm left with a lot of questions. Why does the font always generate the same string to replace the exclamation points as he moves from gedit to gimp? Shouldn't the LLM be creating a new "inference"?

As an aside, I originally thought this was going to generate a new font "style" that matched the text. So for example, "once upon a time" would look like a storybook style font or if you wrote something computer science-related, it would look like a tech manual font. I wonder if that's possible.

  • So, another poster cleared up my first question. It's probably because the seed is the same. I think it would have been a better demo if it hadn't been, though.

    • You got it, same seed in practice, but also just temperature = 0 for the demo actually. A few things I considered adding for the fun of it were 1) a way to specify a seed in the input text, 2) a way to using a symbol to say "I didn't like that token, try to generate another one", so you could do, say, "!" to generate tokens, "?" to replace the last generated token. So you would end up typing things like

      "Once upon a time!!!!!!!!!!!!!!!!!!!!!!!!!!!!!SEED42!!!!!??!!!??!"

      and 3) actually just allow you to override the suggestions by typing what letters on your own, to be used in future inferences. At that point it'd be a fairly generic auto-complete kind of thing.

      1 reply →

Wow, this is incredible. OP you (I?) should train a few models with different personalities/tasks and pair them with the 5 GitHub Monaspace fonts accordingly, allowing people in multifont programs to easily get different kinds of help in different situations. Lots of little ideas sparked by this… in general, I think this a good reminder that we are vastly underestimating fonts in discussions of UI (and, it appears, UX in full!)

It seems like it'd be possible to, instead of typing multiple exclamation points, have one trigger-character (eg. ). And then replace that character visually with an entire paragraph of text, assuming there aren't limits to the width of a character in fonts. I suppose the cursor and text wrapping would go wonky, though.

You could also use this to make animated fonts. An excuse to hook up a diffusion model next?

I may be doing this wrong but...the font provided just install as OpenSans and does not provide any functionality at least in mousepad or LibreOffice Writer. I am talking about the 90mb one

  • Yeah, sorry, that could have been clearer, I added a few more instructions. Basically, chances are that even if you've got Harfbuzz running, you're still running a version with no Wasm runtime. If so, chances are you can get away with building it with Wasm support, then add the built library to LD_PRELOAD before running the editor.

    • That was useful. I have indeed compiled and installed wasm-micro and now meson build it successfully. Tho "meson compile -C build" returns an error about not finding "hb-wasm-api-list.hh". Do you have any experience of that?

      EDIT: Nevermind. Using the exact commits you linked give another error (undefined reference to wasm_externref_ref2obj). I give up

      2 replies →

your engineers were so busy finding out if they could, they never stopped to ask if they should!

this is over my head

  • The critical part is knowing that TTF fonts can include a virtual machine.. then he pops an llm into that and replaces instances of !!!!!! with whatever the llm outputs.

    • Not exactly. Harfbuzz, the font shaping library, has an optional feature to use WASM for shaping. Normal font hinting is much more restricted, precisely because Turing-complete fonts are a horrible idea.

first time I've heard of harfbuzz.

So we could expect latex.ttf very soon?

Stopped watching when the demo showed the letter O with a slash. That would confuse me a lot. I am an old timer and expect the zero to have it.

  • It's not possible to write the letter Ø without a slash. The slash is part of the letter.