Comment by simonw

14 hours ago

This thing is very impressive.

The problem it solves is efficiently calculating the height of some wrapped text on a web page, without actually rendering that text to the page first (very expensive).

It does that by pre-calculating the width/height of individual segments - think words - and caching those. Then it implements the full algorithm for how browsers construct text strings by line-wrapping those segments using custom code.

This is absurdly hard because of the many different types of wrapping and characters (hyphenation, emoji, Chinese, etc) that need to be taken into account - plus the fact that different browsers (in particular Safari) have slight differences in their rendering algorithms.

It tests the resulting library against real browsers using a wide variety of long text documents, see https://github.com/chenglou/pretext/tree/main/corpora and https://github.com/chenglou/pretext/blob/main/pages/accuracy...

22 comments

simonw

layer8 2 hours ago

> It does that by pre-calculating the width/height of individual segments - think words - and caching those.

From the description, it doesn’t calculate it, but instead renders the segments in canvas and measures them. That’s still relatively slow compared to what native rendered-text-width APIs will do, and you have to hope that the browser’s rendering will use the identical logic in non-canvas contexts.

spoiler 2 hours ago

I recently battled this and reverted to using DOM measurements. In my case the measurement would be off by around a pixel, which caused layout issues if I tried rendering the text in DOM. This was only happening on some Linux and Android setups

leeoniya 9 hours ago

i wrote something similar for this purpose, but much simpler and in 2kb, without AI, about a year ago.

uWrap.js: https://github.com/chenglou/pretext/issues/18

there are already significant perf improvement PRs open right now, including one done using autoresearch.

simonw 9 hours ago
Looks like uWrap only handles latin characters and doesn't deal with things like soft hyphens or emoji correction, plus uWrap only handles white-space: pre-line while Pretext doesn't handle pre-line but does handle both normal and pre-wrap.
- leeoniya 9 hours ago
  
  correct, it was meant for estimating row height for virtualizing a 100k row table with a latin-ish LTR charset (no emoji handling, etc). its scope is much narrower. still, the difference in perf is significant, which i have found to be true in general of AI-generated geenfield code.
  
  1 reply →
eviks 5 hours ago
uWrap demo has text extending beyond text boxes all other the place on Safari, is that the price of simplicity?
- leeoniya 9 minutes ago
  
  i don't have a mac to test this with currently, so hopefully it's not the price but a matter of adding a Safari-specific adjustement :)
  internally it still uses the Canvas measureText() API, so there's nothing fundamentally that should differ unless Safari has broken measureText, which tbh, would not be out of character for that browser.
  
  1 reply →
liuliu 9 hours ago
prepare uses measure text, if it is in a for loop, it won't be fast. This library is meant to do prepare once and then layout many times. layout calls should be sub-1 ms.
- leeoniya 9 hours ago
  
  it is not clear from the API/docs how i would use prepare() once on one text and then use layout() for completely different text.
  i think the intended purpose is that your text is maybe large but static and your layout just changes quickly. this is not the case for figuring out the height of 100k rows of different texts in a table, for example.
  
  2 replies →
contrahax 8 hours ago

There's a handful of perf related PRs open already so maybe it will be faster soon. I'm sure with enough focus on it we could have a hyper optimized version in a few hours.

rikroots 14 hours ago

> This thing is very impressive.

Agreed! Text layout engines are stupidly hard. You start out thinking "It's a hard task, but I can do it" and then 3 months later you find yourself in a corner screaming "Why, Chinese? Why do you need to rotate your punctuation differently when you render in columns??"

This effort feeds back to the DOM, making it far more useful than my efforts which are confined to rendering multiline text on a canvas - for example: https://scrawl-v8.rikweb.org.uk/demo/canvas-206.html

eviks 5 hours ago
Why do you bring up Chinese cornes if the basic Latin text in the Pretext demo is deficient?
(by the way, in your cool demo the wheel template can have some letter parts like the top of L or d extend beyond the wheel)
- rikroots 2 hours ago
  
  > the wheel template can have some letter parts like the top of L or d extend beyond the wheel
  Yeah - I use the template (in that case, a circle) to calculate line lengths, then I run 2d text along the 1d lines. Even if I tried to keep all of the glyphs inside the wheel I'd fail - because some fonts lie about how tall they are. Fonts are, basically, criminals.

jimkleiber 14 hours ago

I had struggled so much to measure text and number of lines when creating dynamic subtitles for remotion videos, not sure if it was my incompetence or a complexity with the DOM itself. I feel hopeful this will make it much easier :-)

TacticalCoder 11 hours ago

> The problem it solves is efficiently calculating the height of some wrapped text on a web page, without actually rendering that text to the page first (very expensive).

But in the end, in a browser, the actual text rendering is still done by the browser?

It's a library that allows to "do stuff" before the browser renders the actual text, but by still having the browser render, eventually, the actual text?

Or is this thing actually doing the final rendering of the text too?

simonw 11 hours ago
Yes the browser still renders the text at the end - but you can now do fancy calculations in advance to decide where you're going to ask the browser to draw it.
- tadfisher 10 hours ago
  
  I suspect exposing the browser's text layout measurer is going to become a Web API in the future, much like exposing the HTML parser via setHTML().

slopinthebag 2 hours ago

I mentioned this elsewhere, but it's not actually doing any of what you describe. It's rendering the text to a canvas and then measuring that. I don't see any benchmarks that indicate it's faster than just sticking it in a <p> tag, and not any clear indication that it would be. It's certainly not implementing the full algorithm for text rendering in a browser.

It certainly seems to provide an API for analysing text layouts, but all of the computation still goes through the browser's native layout system.