Comment by daemonologist
3 days ago
I've found Qwen3-VL to be fairly accurate at detection (though it doesn't always catch every instance). Note that it gives answers as per-mille-ages, as if the image was 1000x1000 regardless of actual resolution or aspect ratio.
I have also not had luck with any kind of iterative/guess-and-check approach. I assume the models are all trained to one-shot this kind of thing and struggle to generalize to what are effectively relative measurements.
No comments yet
Contribute on Hacker News ↗