Comment by constantinum

6 days ago

Reading from the comments, some of the common questions regarding document extraction are:

* Run locally or on premise for security/privacy reasons

* Support multiple LLMs and vector DBs - plug and play

* Support customisable schemas

* Method to check/confirm accuracy with source

* Cron jobs for automation

There is Unstract that solves the above requirements.

https://github.com/Zipstack/unstract