Comment by mdahardy
3 months ago
We have an example of a failed cross-tile result in the article - the models seem like they're much better at detecting whether something is in an image vs. identifying the boundaries of those items. This probably has to do with how they're trained - if you train on descriptions/image pairs, I'm not sure how well that does at learning boundaries.
Reload are challenging because of how the agent-action loop works. But the models were pretty good at identifying when a tile contained an item.
No comments yet
Contribute on Hacker News ↗