Comment by ethin
4 days ago
So I tried this on the NVMe specification (I have a huge library of PDFs) and it worked decently, though the output had some oddities:
- Parts of the table of contents were headings
- I didn't like how tables were links to separate markdown files.
In theory, I could recombine everything into one document, but that would require complicated Markdown parsing and manipulation and I wasn't even sure how to go about that given how free-form the resulting text was. I also haven't gone through the entire document (it's 784 pages) to check to make sure it's correct compared to what pdftotext or acrobat could create, so there's that too.
No comments yet
Contribute on Hacker News ↗