Comment by iforgot22

3 days ago

If your PDF has a traditional index in it, you can read it then jump to the right page.

If, and that's one huge if, the PDF is structured so that you can do that.

Some are. Far, far, far, far, far too many aren't.

The half-assedry of PDF creation is a major frustration.

  • You mean like page 20 in the PDF isn't "page 20" in the index? Unless the pages are out of order or extra stuff is inserted, you should be able to simply add an offset. Or worst case, you binary-search the PDF like you would with a book.

    • There are various permutations.

      There are scanned-in books whose index pages don't precisely match the digital pages. Good PDFs will account for that offset themselves, but manual recalculation may be necessary.

      Worse are books half-assedly converted from print to digital. These often include an index (useful for all the reasons others have mentioned elsewhere in this thread), but the "faithful" reproduction of the print text means that the page enumeration in the index bears a nonconstant relationship to the digital text. The offsets are not constant.

      Then there are ePubs with the above feature. The sane thing to do would be to link the index entry to occurances. Often you'll find, again, print-edition page mentions which are of little use in locating the passage within your digital edition.

      One of the underlying problems is that the print notion of "page" is increasingly archaic. For languages / typographies in which paragraphs are a useful convention, paragraph numbering might be preferable (this should be constant across formats). Direct symlinks are of course useful, but these conceal information revealed in a conventional (print) index such as passages where a topic is discussed at some length, or clusters of appearances, as well as cross-references or associated references which a well-constructed index will reveal.

      2 replies →