Comment by eesmith

18 days ago

I still think you should try a book which has been much less studied than the ones you mentioned. The LLM is almost certainly trained on Wikipedia, which has a lot of this information, plus a lot of essays for high school level assignments.

I found 'Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel' at https://aclanthology.org/L16-1028/ which describes some manual annotation efforts for Pride and Prejudice. I don't know if the result is available, but the text suggests it is.

It points out a fun observation: "characters maybe referred to by multiple names, sometimes drastically different (e.g. Dr. Jeykll and Mr. Hyde)"

Huh. https://aclanthology.org/2022.latechclfl-1.10.pdf says "that the character networks of translations differ from originals in case of long novels, and the differences may also vary depending on the novel and translator’s strategy."

Ooo, it cites https://theseaofbooks.com/2016/04/29/the-5-least-important-c... which is about the 5 least important characters in Pride and Prejudice:

> So if you filled out our reader survey and are fairly sure you didn’t come across 117 people in Pride and Prejudice last time you read it, this is because when we compiled that list, we added every last entity that could possibly be considered a character. In fact, Pride and Prejudice has a small, cast of characters, compared to certain of our other novels. Ever wanted to know the population of Middlemarch, for example? By our reckoning, it’s the tidy figure of 333! (Admittedly, some of them are goats.)

This might be useful: "Using Citizen Science to study literary social networks" at https://txtlab.org/2024/12/using-citizen-science-to-study-li...

> By mobilizing volunteers to annotate character interactions, we gathered a high-quality dataset of 13,395 labeled interactions from contemporary fiction and non-fiction books. This dataset forms the foundation for understanding how genres and audience factors influence the social structures in narratives.

This appears to be an interesting field, which I have no time to explore any further. :(