Comment by gavmor

4 days ago

The original chunk, sure, but what if the original chunk is full of eg pronouns? This is a problem I haven't heard an elegant solution for, although I've seen it done OK.

What I mean is, how can you derive topics from a chunk that refers to them only obliquely?

Before chunking, run coreference resolution to get rid of all of your pronouns and replace them with explicit references. You need to be a bit of careful to ensure you chunk both processed and unprocessed versions in the same places but it’s very doable.

If you haven’t seen it, there’s a lovely overview of the idea in one of the SpaCy blog posts: https://explosion.ai/blog/coref