← Back to context

Comment by gillesjacobs

7 months ago

I really want to see some evaluation benchmark comparisons on in-chunk augmentation approaches like this (and question, title, header-generation) and the hybrid retrieval approach where you match at multiple levels: first retrieve/filter on a higher-level summary, title or header, then match the related chunks.

The pure vector approach of in-chunk text augmentation is much simpler of course, but my hypothesis is that the resulting vector will cause too much false positives in retrieval.

In my experience retrieval precision is most commonly the problem not recall with vector similarity. This method will indeed improve recall for out-of-context chunks, but for me recall has not been a problem very often.