Comment by WorldMaker
8 days ago
> if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf?
Archive.org would recommend using the OpenLibrary IDs instead of ISBNs. (OpenLibrary is an Archive.org project.)
> The number of oddball publishers and pamphlets and so forth that have never been cataloged anywhere is enormous.
I think it's more the case that number of catalogs is too many. At least with LibraryThing it always seems like somebody has cataloged everything, but we have such a hodgepodge of ID systems and catalog numbers in part because so rarely have all the catalogs been connected or have tried to be connected. It's only a relatively recent library phenomenon that so many small library catalogs can talk to each other on the same protocol, much less coexist in the same broader search tool.
> Cataloging my own library, I've had to use a hodgepodge of unique ids. ASINs, ISBNs, Worldcat's OCLC numbers, Open Library's, and a few others besides.
In part because most of my personal catalog is in LibraryThing, I've been impressed with LibraryThing's Works ID as a generally trustworthy unique ID for a book. LibraryThing benefits from an interesting mix of volunteer and professional librarian work (especially the work of a lot of tiny and interesting niche libraries across the world) in deduping and merging editions together into the same Work ID. StoryGraph and OpenLibrary are also doing interesting things in this space, but LibraryThing has the momentum of time (it's as old as GoodReads and not an Amazon side project) and the benefit of extra (nerdy) labor.
I also like the LibraryThing IDs because they are generally short, opaque (which is a weird feature sometimes), and don't look anything like an ISBN because they aren't intended for that. StoryGraph's IDs are GUIDs, which I will forever find ugly in their normal - delimited hexadecimal rendering. Open Library's look like ISBNs for reasons that I don't understand, but I do appreciate that you can use the last letter of the ID to distinguish between an edition ID (ends in M for reasons I don't know why) and a work ID (ends in W), and the OL prefix does help them stand out next to other catalogs' IDs.
I built a voting website for my current favorite book club and I thought I could do everything with just the LibraryThing Works ID but then I keep adding other IDs to the "database" (YAML frontmatter) as time goes on. LibraryThing doesn't have a Covers API because most of their edition covers come from Amazon and Amazon is restrictive on that. If I add the OpenLibrary Edition ID, I can use the OpenLibrary Covers API as Archive.org has very nice terms on that today. (Not the OpenLibrary Works ID, because covers are associated at the Edition level, which does make some sense, but the website UI shows a default cover from a random edition so I'm not sure why the API couldn't return that cover from the Works ID, but it is nice to pick and choose Edition covers anyway and I can't complain too much having a working cover image API from someone.) I started adding StoryGraph IDs because members of the club love StoryGraph right now and also because while StoryGraph doesn't have an Official API yet (it is on the Roadmap), I discovered StoryGraph's CWs section was amenable to easy scraping. I figured since an API for it is on the Roadmap a bit of light scraping (with attribution!) was fair. (My club wanted CW information to help decide on book voting. LibraryThing intentionally doesn't track CWs as too hot button and subjective, but StoryGraph has a rather nice "voting" experience for CWs and before I started to scrape StoryGraph's CWs we were already starting to copy and paste them by hand into the Markdown documents. The scraping provides better attribution and a unified display.)
No comments yet
Contribute on Hacker News ↗