Comment by yaj54
5 days ago
> 3. Hypothetical answer generation from a query using an LLM, and then using that hypothetical answer to query for embeddings works really well.
I've been wondering about that and am glad to hear it's working in the wild.
I'm now wondering if using a fine-tuned LLM (on the corpus) to gen the hypothetical answers and then use those for the rag flow would work even better.
The technique of generating hypothetical answers (or documents) from the query was first described in the "HyDE (Hypothetical Document Expansion) paper". [1]
Interestingly, going both ways: generate hypothetical answers for the query, and also generate hypothetical questions for the text chunk at ingestion both increase RAG performance in my experience.
Though LLM-based query-processing is not always suitable for chat applications if inference time is a concer (like near-real time customer support RAG), so ingestion-time hypothetical answer generation is more apt there.
1. https://aclanthology.org/2023.acl-long.99/
We do this as well with a lot of success. It’s cool to see others kinda independently coalescing around this solution.
What we find really effective is at content ingestion time, we prepend “decorator text” to the document or chunk. This incorporates various metadata about the document (title, author(s), publication date, etc).
Then at query time, we generate a contextual hypothetical document that matches the format of the decorator text.
We add hybrid search (BM25 and rerank) to that, also add filters (documents published between these dates, by this author, this type of content, etc). We have an LLM parameterize those filters and use them as part of our retrieval step.
This process works incredibly for end users.
but what about the chunk size, if we have a small chunks like 1 sentence and the hyde embeddings are most of the time larger, the results are not so good