Comment by fastpdfai
3 hours ago
One thing I really want to find out, is which model and how to process TONS of pdfs very very fast, and very accurate. For prediction of invoice date, accrual accounting and other accounting related purposes. So a decent smart model that is really good at pdf and image reading. While still being very very fast.
I have a use case somewhat similar to this where I need to convert the content of PDFs in a non standard format to a specific YAML format. I currently use Haiku for this and am pleased with the accuracy/speed (I haven't tried scanned PDFs yet tho) however I've been thinking about fine tuning a small Qwen model for just this task. I can't yet justify the effort to investigate it but I imagine it could work out.