Comment by gavmor
4 days ago
The original chunk, sure, but what if the original chunk is full of eg pronouns? This is a problem I haven't heard an elegant solution for, although I've seen it done OK.
What I mean is, how can you derive topics from a chunk that refers to them only obliquely?
Before chunking, run coreference resolution to get rid of all of your pronouns and replace them with explicit references. You need to be a bit of careful to ensure you chunk both processed and unprocessed versions in the same places but it’s very doable.
If you haven’t seen it, there’s a lovely overview of the idea in one of the SpaCy blog posts: https://explosion.ai/blog/coref
Oh wow, yes, this is clever!