← Back to context

Comment by lyu07282

1 year ago

Indeed, from their conclusions:

> They [VLMs] are generally more capable of "looking past the noise" of scan lines, creases, watermarks. Traditional models tend to outperform on high-density pages (textbooks, research papers) as well as common document formats like tax forms.

Which is a bit confusing? Did they test that or what? It doesn't seem that way from their limited dataset.