Comment by exclipy
11 days ago
Free idea: turn this into an MCP server. Give the agent the ability to virtually "hover" a path and see which part of the final render it corresponds to
11 days ago
Free idea: turn this into an MCP server. Give the agent the ability to virtually "hover" a path and see which part of the final render it corresponds to
If anyone sees this, I tried it and unfortunately am not getting better results on the pelican-on-bicycle test. I think the vision models just aren't good enough yet (I tried Claude and Gemini)
I can share the code if there's interest.