Comment by Wowfunhappy

2 months ago

Interesting thought. I wonder if Anthropic et al could include some sort of render-html-to-screenshot as part of the training routine, such that the rendered output would get included as training data.

2 comments

Wowfunhappy

btown 2 months ago

Even better, a tool that can tell the rendered bounding box of any set of elements, and what the distances between pairs of elements are, so it can make adjustments if relative positioning doesn't match its expectation. This would be incredible for SVG generation for diagrams, too.

KaiserPro 2 months ago

thats basically a VLM, but the problem is that describing the world requires a better understanding of the world. Hence why LeCunn is talking about world models (Its also cutting edge for teaching robots to manipulate and plan manipulations)