Comment by ethin

2 months ago

So I tried this on the NVMe specification (I have a huge library of PDFs) and it worked decently, though the output had some oddities:

- Parts of the table of contents were headings

- I didn't like how tables were links to separate markdown files.

In theory, I could recombine everything into one document, but that would require complicated Markdown parsing and manipulation and I wasn't even sure how to go about that given how free-form the resulting text was. I also haven't gone through the entire document (it's 784 pages) to check to make sure it's correct compared to what pdftotext or acrobat could create, so there's that too.

0 comments

ethin

No comments yet

Contribute on Hacker News ↗