Comment by aa-jv
7 hours ago
I simply print to PDF, anything interesting I've read online. So now I've got 30+ years of my own private offline Internet experience.
Some 80,000+ files in a directory represents an awesome database of knowledge. "$ ls inux" to find anything Linux-related, etc.
One of these days I'll get around to setting up some ML tool that will tell me all the things I didn't already osmose from the archive .. and maybe long after I'm gone, in some hole in a wall of some grimy back alley somewhere, there'll be a ML version of me embedded in a brick, ready to have the conversation well into the future ..
https://github.com/paperless-ngx/paperless-ngx might be a nice rabbit hole for you, drop the files in there and it'll be OCR'ed and searchable. There's also some AI projects you can give access to paperless to achieve your use case.