Comment by NoMoreNicksLeft

5 days ago

>Uh-oh. Why do we have so many distinct versions of The Last Unicorn? Well, each distinct format of a work has its own ISBN (so a hardcover, paperback, and eBook all have different ISBNs),

This isn't even the half of it. On some digital books, I'll find a dozen ISBNs in the front matter. Of course there's the hardback, the clothbound (not always the same as the hardback), the alk. paper variant, paperback, trade paperback, epub, pdf, "Adobe digital", and "master digital e-book" (no idea what that even is myself). And that's all just issued together. If they reprint, it won't get a new ISBN, but if the rights convey to another publisher, that one will get a whole 'nother set again. Some popular titles likely have low hundreds of ISBNs, and keep in mind that these have only been a thing since the late 1960s (9 digit ISBNs, technically just SBNs back then). Then with the now dead paperback trade, you could go through a dozen different covers for the most popular books (King, etc) but they'd all use the same ISBN.

Then, and this one bites me the most... if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf? I've decided that for lack of a better alternative I have to use it, but if the publisher made their own pdf (even just scanning the hardback), then it is supposed to issue a new ISBN to it.

Cataloging my own library, I've had to use a hodgepodge of unique ids. ASINs, ISBNs, Worldcat's OCLC numbers, Open Library's, and a few others besides. And it still comes up short. The number of oddball publishers and pamphlets and so forth that have never been cataloged anywhere is enormous.

>if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf?

The scanned pdf just doesn't have an ISBN. ISBNs are assigned by publishers to products for inventory management. That's it. If archive.org scans a book, it's not a product that needs inventory control.

  • I need unique identifiers. And I disagree. For me, the scan of a book keeps the same ISBN as the printed book that was scanned (when it has an ISBN at all). No other sensible alternative really exists. I also believe Open Library catalogs them the same (since they are archive.org too, and doing much of the scanning).

    • You may need a unique identifier. You may use the isbn if you like. No one will stop you from doing that. But no isbn-issuing entity has applied an isbn to that file.

Your question has already been answered, but you considered the option of specifying several ISBNs, a description of the book, a link to the website with this edition, the publisher, and a note with details of the book's format (hardcover, soft cover, etc.)

Personally, I have never had all these indicators match in any book. It also allows you to find a very specific publication using a semantic search, specifying a combination of tags/publisher/formats.

> if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf?

Archive.org would recommend using the OpenLibrary IDs instead of ISBNs. (OpenLibrary is an Archive.org project.)

> The number of oddball publishers and pamphlets and so forth that have never been cataloged anywhere is enormous.

I think it's more the case that number of catalogs is too many. At least with LibraryThing it always seems like somebody has cataloged everything, but we have such a hodgepodge of ID systems and catalog numbers in part because so rarely have all the catalogs been connected or have tried to be connected. It's only a relatively recent library phenomenon that so many small library catalogs can talk to each other on the same protocol, much less coexist in the same broader search tool.

> Cataloging my own library, I've had to use a hodgepodge of unique ids. ASINs, ISBNs, Worldcat's OCLC numbers, Open Library's, and a few others besides.

In part because most of my personal catalog is in LibraryThing, I've been impressed with LibraryThing's Works ID as a generally trustworthy unique ID for a book. LibraryThing benefits from an interesting mix of volunteer and professional librarian work (especially the work of a lot of tiny and interesting niche libraries across the world) in deduping and merging editions together into the same Work ID. StoryGraph and OpenLibrary are also doing interesting things in this space, but LibraryThing has the momentum of time (it's as old as GoodReads and not an Amazon side project) and the benefit of extra (nerdy) labor.

I also like the LibraryThing IDs because they are generally short, opaque (which is a weird feature sometimes), and don't look anything like an ISBN because they aren't intended for that. StoryGraph's IDs are GUIDs, which I will forever find ugly in their normal - delimited hexadecimal rendering. Open Library's look like ISBNs for reasons that I don't understand, but I do appreciate that you can use the last letter of the ID to distinguish between an edition ID (ends in M for reasons I don't know why) and a work ID (ends in W), and the OL prefix does help them stand out next to other catalogs' IDs.

I built a voting website for my current favorite book club and I thought I could do everything with just the LibraryThing Works ID but then I keep adding other IDs to the "database" (YAML frontmatter) as time goes on. LibraryThing doesn't have a Covers API because most of their edition covers come from Amazon and Amazon is restrictive on that. If I add the OpenLibrary Edition ID, I can use the OpenLibrary Covers API as Archive.org has very nice terms on that today. (Not the OpenLibrary Works ID, because covers are associated at the Edition level, which does make some sense, but the website UI shows a default cover from a random edition so I'm not sure why the API couldn't return that cover from the Works ID, but it is nice to pick and choose Edition covers anyway and I can't complain too much having a working cover image API from someone.) I started adding StoryGraph IDs because members of the club love StoryGraph right now and also because while StoryGraph doesn't have an Official API yet (it is on the Roadmap), I discovered StoryGraph's CWs section was amenable to easy scraping. I figured since an API for it is on the Roadmap a bit of light scraping (with attribution!) was fair. (My club wanted CW information to help decide on book voting. LibraryThing intentionally doesn't track CWs as too hot button and subjective, but StoryGraph has a rather nice "voting" experience for CWs and before I started to scrape StoryGraph's CWs we were already starting to copy and paste them by hand into the Markdown documents. The scraping provides better attribution and a unified display.)