← Back to context

Comment by exDM69

1 day ago

Very cool stuff, text rendering is a really hairy problem.

I also got nerd sniped by Sebastian Lague's recent video on text rendering [0] (also linked to in the article) and started writing my own GPU glyph rasterizer.

In the video, Lague makes a key observation: most curves in fonts (at least for Latin alphabet) are monotonic. Monotonic Bezier curves are contained within the bounding box of its end points (applies to any monotonic curve, not just Bezier). The curves that are not monotonic are very easy to split by solving the zeros of the derivative (linear equation) and then split the curve at that point. This is also where Lague went astray and attempted a complex procedure using geometric invariants, when it's trivially easy to split Beziers using de Casteljau's algorithm as described in [1]. It made for entertaining video content but I was yelling at the screen for him to open Pomax's Bezier curve primer [1] and just get on with it.

For monotonic curves, it is computationally easy to solve the winding number for any pixel outside the bounding box of the curve. It's +1 if the pixel is to the right or below the bounding box, -1 if left or above and 0 if outside of the "plus sign" shaped region off to the diagonals.

Further more, this can be expanded to solving the winding number for an entire axis aligned box. This can be done for an entire GPU warp (32 to 64 threads): each thread in a warp looks at one curve and checks if the winding number is the same for the whole warp and accumulate, if not, set a a bit that this curve needs to be evaluated per thread.

In this way, very few pixels actually need to solve the quadratic equation for a curve in the contour.

There's still one optimization I haven't done: solving the quadratic equation in for 2x2 pixel quads. I solve both vertical and horizontal winding number for good robustness of horizontal and vertical lines. But the solution for the horizontal quadratic for a pixel and the pixel below it is the same +/- 1, and ditto for vertical. So you can solve the quadratic for two curves (a square root and a division, expensive arithmetic ops) for the price of one if you do it for 2x2 quads and use warp level swap to exchange the results and add or subtract 1. This can only be done in orthographic projection without rotation, but the rest of the method also works in with perspective, rotation and skew.

For a bit of added robustness, Jim Blinn's "How to solve a quadratic equation?" [2] can be used to get rid of some pesky numerical instability.

I'm not quite done yet, and I've only got a rasterizer, not the other parts you need for a text rendering system (font file i/o, text shaping etc).

But the results are promising: I started at 250 ms per frame at a 4k rendering of a '@' character with 80 quadratic Bezier curves, evaluating each curve at each pixel, but I got down to 15 ms per frame by applying the warp vs. monotonic bounding box optimizations.

These numbers are not very impressive because they are measured on a 10 year old integrated laptop GPU. It's so much faster on a discrete gaming GPU that I could stop optimizing here if it was my target HW. But it's already fast enough for real time in practical use on the laptop because I was drawing an entire screen sized glyphs for the benchmark.

[0] https://www.youtube.com/watch?v=SO83KQuuZvg [1] https://pomax.github.io/bezierinfo/#splitting [2] https://ieeexplore.ieee.org/document/1528437

>Monotonic Bezier curves are contained within the bounding box of its end points

What's a "monotonic Bezier curve"?

Btw, every Bezier curve is contained within its control points' convex hull. It follows from the fact that all points on a Bezier curve are some convex combination of the control points. In other words, the Bezier basis functions sum to 1, and are nonnegative everywhere.

  • > What's a "monotonic Bezier curve"?

    Good question!

    It's a Bezier curve that has a non-zero derivative for t=0..1 (exclusive).

    Just your high school calculus definition of monotonic.

    To get from a general quadratic Bezier to monotonic sections, you solve the derivative for zeros in x and y direction (a linear equation). If the zeros are between 0 and 1 (exclusive), split the Bezier curve using de Casteljau's at t=t_0x and t=t_0y. For each quadratic Bezier you get one to three monotonic sections.

    > every Bezier curve is contained within its control points' convex hull.

    This is true, but only monotonic Bezier curves are contained between the AABB formed by the two end points (so control points in the middle don't need to be considered outside the AABB).

    For a quadratic Bezier this means that it is monotonic iff the middle control point is inside the AABB of the two end points.

    The monotonicity is a requirement for all the GPU warp level AABB magic to happen (which is a nice 10x to 20x perf increase in my benchmarks). At worst you'd have to deal with 3x the number of curves after splitting (still a win), but because most curves in fonts are monotonic the splitting doesn't increase the the number of curves a lot in practice.

    Monotonicity also implies that the quadratic equations have only one unique solution for any horizontal or vertical line. No need to classify the roots as in Lengyel's method.

This sounds amazing, personally I'd love to take a look at the project or read a blog post when you're done.

  • Thanks for the words of encouragement. I need more time to turn this prototype into a practical piece of software and/or a publication (blog post or research paper). Unfortunately there are only so many hours in a day.

    To actually render text I would still need to add text shaping to get from strings of text to pixels on the screen.

    But a bigger problem is how to integrate it into someone else's software project. I have the rasterizer GPU shader and some CPU preprocessing code and the required 3d API code (I'm using Vulkan). I'm not sure how people usually integrate this kind of components to their software, do they just want the shader and preprocessor or do they expect the 3d API code too. Packaging it into a library that has the Vulkan code too has its own problems with interoperability.

    You need to hope that it rains during my summer vacation, maybe this project will see some progress :)

I find it amazing how many people are working and/or playing with text rendering at the moment. I work in Unity and have also been dipping my toes, but trying to think of doing some modern enhancements to the whole thing.

The Unity solution is quite limiting, someone about 10 years ago made a great asset and was actively developing it, but then they bought it and integrated it into their subsystem, after that all development basically stopped. Now in modern times a lot of games are using Slug, and it renders some really beautiful text, but unfortunately they are doing large scale licensing to big companies only.

Some thoughts I've come up with: 1. Creating text is basically a case of texture synthesis. We used to just throw it into textures and let the GPU handle it, but obviously its a general purpose device and not meant to know what information (such as edges) are more important than others. Even a 2k texture of a letter doesn't look 100% when you view it at full screen size.

2. Edge Lines: I have a little business card here from my Doctor's office. At e.g. 20-30cm normal human reading distance the text on it looks great, but zooming in close you can see a lot of differences. The card material isn't anywhere close to flat and the edge lines are really all over the place, in all kinds of interesting ways.

3. Filling: The same happens for the center filled part of a letter or vector, when you zoom in you can see a lot of flaws creep in, e.g. 5% of the visible layer has some or the other color flaw. There are black flecks on the white card, white/black flecks in a little apple logo they have etc.

4. So basically I want to add a distance parameter as well as using size. Both these cases for just rendering normal 2d text is relatively irrelevant, but in 3d people will often go stand right up against something, so the extra detailing will add a lot. For the synthesis part, there's no reason that any of the lines should be a solid fill instead of for example some artistic brush stroke, or using some derivative noise curves to create extra/stable information/detailing.

5. Another thing to look at might be work subdivision. Instead of rendering a whole set of letters in N time, if the camera is relatively stable we can refine those over successive frames to improve detailing, for example go from 2 to M subdivisions per curve.

6. There are numerous available examples such as the following concurrent B-Tree: https://www.youtube.com/watch?v=-Ce-XC4MtWk In their demo they fly into e.g. a planetary body and keep subdividing the visible terrain on-screen to some minimum size; then they can synthesize extra coherent detailing that matches that zoom level from some map, in their case e.g. noise for the moon, or code for water etc.

I find that a lot of the people working on text are sort of doing their own thing data structure wise, instead of looking at these already implemented and proven concurrent solutions. Here is another interesting paper, DLHT: A Non-blocking Resizable Hashtable with Fast Deletes and Memory-awareness / https://arxiv.org/abs/2406.09986

Not to say that those are the way to parallelize the work, or even if that is necessary, but it might be an interesting area where one can increase detailing.