Comment by karel-3d
12 hours ago
When I was working before on something that used headless browser agents, the ability to do a screenshot (or even a recording) was really great for debugging... so I am not sure about the "no paint". But hey everything in life is a trade-off.
Really depends on what you want to do with the agents. Just yesterday I was looking for something like this for our web access MCP server[0]. The only thing that it needs to do is visit a website and get the content (with JS support, as it's expected that most pages today use JS), and then convert that to e.g. Markdown.
I'm not too happy with the fact that Chrome is one of our memory-hungriest parts of all the MCP servers we have in use. The only thing that exceeds that in our whole stack is the Clickhouse shard, which comes with Langfuse. Especially if you are looking to build a "deep research" feature that may access a few hundreds of webpages in a short timeframe, having a lightweight alternative like Lightpanda can make quite the difference.
[0]: https://github.com/EratoLab/web-access-mcp
yeah I feel the same, I think even having a screenshot of part of rendered page or full page can be useful even for machines considering how heavy those HTML can be to parse and expensive for LLM context. Sometimes (sub)screenshot is just a better kind of compression
Yes HTML is too heavy and too expensive for LLM. We are working on a text-based format more suitable for AI.
What do you think of the DeepSeek OCR approach where they say that vision tokens might better compress a document than its pure text representation?
https://news.ycombinator.com/item?id=45640594
I've spent some time feeding llm with scrapped web pages and I've found that retaining some style information (text size, visibility, decoration image content) is non trivial.
1 reply →