← Back to context Comment by sampton 3 months ago You can train a cnn to find bounding boxes of text first. Then run VLM on each box. 0 comments sampton Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗