Comment by derefr
7 months ago
> adding a contextual title, summary, keywords, and questions
That's interesting; do you then transform the question-as-prompt before embedding it at runtime, so that it "asks for" that metadata to be in the response? Because otherwise, it would seem to me that you're just making it harder for the prompt vector and the document vectors to match.
(I guess, if it's equally harder in all cases, then that might be fine. But if some of your documents have few tags or no title or something, they might be unfairly advantaged in a vector-distance-ranked search, because the formats of the documents more closely resemble the response format the question was expecting...)
You can also train query awareness into the embedding model. This avoids LLMs rewriting questions poorly and lets you embed questions the way your customers actually ask them.
For an example with multimodal: https://www.marqo.ai/blog/generalized-contrastive-learning-f...
But the same approach works with text.