Comment by jameswhitford
9 hours ago
Hi, author here, I cannot give an exact number for how many token the verification step took, but the verification GLM 5.2 ran was very stupid and definitely a waste of time. It read the pixel color data to try and verify the scene rendered properly. Which is really bad. Opus opened the game in a Playwright browser and took screenshots to verify the actual image. Which helped a lot.
Pro tip: You could use a multi-modal model to verify images as a subagent spawned by GLM 5.2, to get around this issue.
That's a dumb way to do it, it should just write the frame buffer to a PNG instead of taking screenshots. I guess you can't take the dumb web developer ways out of these models at the end of the day.