Comment by mccoyb

1 day ago

Just from personal experience, visual design is the task with the worst outcomes for Claude Code (w/ latest Opus 4.1, etc).

It truly cannot reason well yet about geometry, visual aesthetics, placement, etc. The performance varies: it's quite good at matplotlib but terrible at any non-trivial LaTeX / TikZ layout or graphic. Why? Not a clear idea yet -- would love to think more about it.

I've tried many things now to try and give it eyes (via text), and this is unavoidably a place where things are ... rough ... right now.

I've had bad results with image screenshotting. More often than not, it has no idea what it is looking at -- won't even summarize the image correctly -- or will give me an incorrect take "Yes indeed we fixed the problem as you can tell by <this part of the UI> and <that part of the UI>" which is wrong.

I typically have to come in and make a bunch of fine-grained changes to get something visually appealing. I'm sure at some point we'll have a system which can go all the way, and I'd be excited to find approaches to this problem.

Note -- tasks which involve visual design which I've run into diminishing returns: * proper academic figures (had a good laugh at the GPT 5 announcement issues) * video game UI / assets * UI design for IDEs * Responsive web design for chat-based interfaces

All of these feel like "pelican" tasks -- they enter into a valley which can't be effectively communicated via textual feedback yet ...

Just reflecting on my own comment -- what one might want is an automated layout system with a simple "natural language"-like API (perhaps similar to Penrose, although it's been awhile since I looked at that project).

Hardened and long use systems like TikZ of course, do have something like this -- but in complex TikZ graphics, you end up with a mixture of "right of" and "left of" (etc) and low-level manual specification, which I think tends to fall into the zone of issues.