Comment by exclipy

11 days ago

Free idea: turn this into an MCP server. Give the agent the ability to virtually "hover" a path and see which part of the final render it corresponds to

If anyone sees this, I tried it and unfortunately am not getting better results on the pelican-on-bicycle test. I think the vision models just aren't good enough yet (I tried Claude and Gemini)

I can share the code if there's interest.