Comment by nicoburns

1 month ago

This table is informative as to exactly what lightpanda is: https://lightpanda.io/blog/posts/what-is-a-true-headless-bro...

TL;DR: It does the following:

- Fetch HTML over the network

- Parse HTML into a DOM tree

- Fetch and execute JavaScript that manipulates the DOM

But not the following:

- Fetch and parse CSS to apply styling rules

- Calculate layout

- Fetch images and fonts for display

- Paint pixels to render the visual result

- Composite layers for smooth scrolling and animations

So it's effectively a net+DOM+script-only browser with no style/layout/paint.

---

Definitely fun for me to watch as someone who is making a lightweight browser engine with a different set of trade-offs (net+DOM+style/layout/paint-only with no script)

11 comments

nicoburns

karel-3d 1 month ago

When I was working before on something that used headless browser agents, the ability to do a screenshot (or even a recording) was really great for debugging... so I am not sure about the "no paint". But hey everything in life is a trade-off.

hobofan 1 month ago
Really depends on what you want to do with the agents. Just yesterday I was looking for something like this for our web access MCP server[0]. The only thing that it needs to do is visit a website and get the content (with JS support, as it's expected that most pages today use JS), and then convert that to e.g. Markdown.
I'm not too happy with the fact that Chrome is one of our memory-hungriest parts of all the MCP servers we have in use. The only thing that exceeds that in our whole stack is the Clickhouse shard, which comes with Langfuse. Especially if you are looking to build a "deep research" feature that may access a few hundreds of webpages in a short timeframe, having a lightweight alternative like Lightpanda can make quite the difference.
[0]: https://github.com/EratoLab/web-access-mcp
- karel-3d 1 month ago
  
  Well, it was "normal" crawlers that needed to work perfectly and deterministically (as best as possible), not probabilistically (AI); speed was no issue. And I wanted to debug when something went wrong. So yeah for me it was crucial to be able to record/screenshot.
  So yeah, everything is a trade-off, and we needed a different trade-off; we actually decided to not use headless chromium, because they are slight differences, so we ended up using full chrome (not even chromium, again - slight differences) with xvfb. It was very, very memory hungry; but again was not an issue
  (I used "agent" as in "browser agent", not "AI agent", I should be more precise I guess.)
pzo 1 month ago
yeah I feel the same, I think even having a screenshot of part of rendered page or full page can be useful even for machines considering how heavy those HTML can be to parse and expensive for LLM context. Sometimes (sub)screenshot is just a better kind of compression
- fbouvier 1 month ago
  
  Yes HTML is too heavy and too expensive for LLM. We are working on a text-based format more suitable for AI.
  
  2 replies →

warpech 1 month ago

> So it's effectively a net+DOM+script-only browser with no style/layout/paint.

> ---

> Definitely fun for me to watch as someone who is making a lightweight browser engine with a different set of trade-offs (net+DOM+style/layout/paint-only with no script)

Both projects (Lightpanda, DioxusLabs/blitz) sound very interesting to me. What do you think about rendering patterns that require both script+layout for rendering, e.g. virtual scrolling of large tables?

What would be a good pattern to make virtual scrolling work with Lightpanda or Blitz?

nicoburns 1 month ago
So Blitz does technically have scripting, it's just Rust scripting rather than JavaScript scripting. So the plan for virtual scrolling would likely be to implement it in Rust.
If your aim is to render a UI (ala Electron/Flutter) then we have a React-style framework (Dioxus) that runs on top of Blitz, and allows you access to the low-level Rust API of the DOM for advanced use cases (although it's still a WIP and this API is a bit rough atm). I'm also hoping to eventually have a built-in `RecyclerView`-like widget for this (that can bypass the style/layout systems for much more efficient virtual scrolling).
- warpech 1 month ago
  
  Thanks! But I meant JS based virtual scrolling in web pages. E.g. dynamic data tables that only render the part of the table that fits in the viewport.
krichprollsch 1 month ago

For scrolling, when using Intersection Observer, we currently assume all elements are visible. So, if you register an observer, we will dispatch an entry indicating an intersection with a ratio of 1.0.