Comment by tweezy

8 months ago

We do this as well with a lot of success. It’s cool to see others kinda independently coalescing around this solution.

What we find really effective is at content ingestion time, we prepend “decorator text” to the document or chunk. This incorporates various metadata about the document (title, author(s), publication date, etc).

Then at query time, we generate a contextual hypothetical document that matches the format of the decorator text.

We add hybrid search (BM25 and rerank) to that, also add filters (documents published between these dates, by this author, this type of content, etc). We have an LLM parameterize those filters and use them as part of our retrieval step.