Comment by lyu07282
1 year ago
Indeed, from their conclusions:
> They [VLMs] are generally more capable of "looking past the noise" of scan lines, creases, watermarks. Traditional models tend to outperform on high-density pages (textbooks, research papers) as well as common document formats like tax forms.
Which is a bit confusing? Did they test that or what? It doesn't seem that way from their limited dataset.
No comments yet
Contribute on Hacker News ↗