Comment by _pdp_
13 hours ago
I don't see this being such a big gap. There are some use-cases for sure but apart from UX/UI work it is not really needed. Besides, none of the frontier models can replicate actual images - the can approximate at least in my own experience.
One of my tests for a new model is dumping in a screenshot of a web page and seeing if it can recreate it from scratch in HTML and CSS.
Even the local models I run on my Mac are getting surprisingly good at that now.
a pretty fun and quick tests i do with vision models is to screenshot the hackernews homepage and ask the model to return a json representation of the screenshot - qwen 3.5 0.8b did surprisingly well at this.
Using llms to generate docx. Being able to rasterize and review is an important part of the process.