← Back to context

Comment by trevoragilbert

1 day ago

This is very cool! For the name extraction, how are you handling false positives across such a large dataset? I’m assuming there are mentions that could be a name but are actually just a noun. For example, Agricola being the word for farmer but also a name.

So most inscriptions are somewhat formulary, and I provide examples to the llm to assist it to find the names. I also have a postprocess blacklist that removes some known cases where things slip through. It's never going to be 100% perfect but to my untrained eye, it seems to do okayish. Waiting on some professionals to cross check my data. If that is you, you can search and export the data in csv via the browse button.