← Back to context Comment by m00dy 11 hours ago RAG is broken when you have too much data. 9 comments m00dy Reply plingamp 10 hours ago Specifically when the document number reaches around 10k+, a phenomenon called "Semantic Collapse" occurs.https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Halluc... yjftsjthsd-h 8 hours ago So you're telling me rampancy ( https://www.halopedia.org/Rampancy ) is real. zophi 5 hours ago > Specifically when the document number reaches around 10k+Where are you getting this? just read the paper and not seeing it -- interested to learn more RGamma 3 hours ago The RAG GP used suffered from semantic collapse. thunky 10 hours ago Gemini with Google search is RAG using all public data, and it isn't broken. fhd2 10 hours ago It's not tool use with natural language search queries? That's what I'd expect. thunky 8 hours ago It's RAG via tool use, where the storage and retreival method is an implementation detail.I'm not a huge fan of the term RAG though because if you squint almost all tool use could be considered RAG.But if you stick with RAG being a form of "knowledge search" then I think Google search easily fits. kaicianflone 10 hours ago It is tool use with natural language search queries but going down a layer they are searched on a vector DB, very similar to RAG. Essentially Google RankBrain is the very far ancestor to RAG before compute and scaling. PlatoIsADisease 10 hours ago Cant you make thresholds higher?Hmm... I guess not, you might want all that data.Super interesting topic. Learning a lot.
plingamp 10 hours ago Specifically when the document number reaches around 10k+, a phenomenon called "Semantic Collapse" occurs.https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Halluc... yjftsjthsd-h 8 hours ago So you're telling me rampancy ( https://www.halopedia.org/Rampancy ) is real. zophi 5 hours ago > Specifically when the document number reaches around 10k+Where are you getting this? just read the paper and not seeing it -- interested to learn more RGamma 3 hours ago The RAG GP used suffered from semantic collapse.
yjftsjthsd-h 8 hours ago So you're telling me rampancy ( https://www.halopedia.org/Rampancy ) is real.
zophi 5 hours ago > Specifically when the document number reaches around 10k+Where are you getting this? just read the paper and not seeing it -- interested to learn more RGamma 3 hours ago The RAG GP used suffered from semantic collapse.
thunky 10 hours ago Gemini with Google search is RAG using all public data, and it isn't broken. fhd2 10 hours ago It's not tool use with natural language search queries? That's what I'd expect. thunky 8 hours ago It's RAG via tool use, where the storage and retreival method is an implementation detail.I'm not a huge fan of the term RAG though because if you squint almost all tool use could be considered RAG.But if you stick with RAG being a form of "knowledge search" then I think Google search easily fits. kaicianflone 10 hours ago It is tool use with natural language search queries but going down a layer they are searched on a vector DB, very similar to RAG. Essentially Google RankBrain is the very far ancestor to RAG before compute and scaling.
fhd2 10 hours ago It's not tool use with natural language search queries? That's what I'd expect. thunky 8 hours ago It's RAG via tool use, where the storage and retreival method is an implementation detail.I'm not a huge fan of the term RAG though because if you squint almost all tool use could be considered RAG.But if you stick with RAG being a form of "knowledge search" then I think Google search easily fits. kaicianflone 10 hours ago It is tool use with natural language search queries but going down a layer they are searched on a vector DB, very similar to RAG. Essentially Google RankBrain is the very far ancestor to RAG before compute and scaling.
thunky 8 hours ago It's RAG via tool use, where the storage and retreival method is an implementation detail.I'm not a huge fan of the term RAG though because if you squint almost all tool use could be considered RAG.But if you stick with RAG being a form of "knowledge search" then I think Google search easily fits.
kaicianflone 10 hours ago It is tool use with natural language search queries but going down a layer they are searched on a vector DB, very similar to RAG. Essentially Google RankBrain is the very far ancestor to RAG before compute and scaling.
PlatoIsADisease 10 hours ago Cant you make thresholds higher?Hmm... I guess not, you might want all that data.Super interesting topic. Learning a lot.
Specifically when the document number reaches around 10k+, a phenomenon called "Semantic Collapse" occurs.
https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Halluc...
So you're telling me rampancy ( https://www.halopedia.org/Rampancy ) is real.
> Specifically when the document number reaches around 10k+
Where are you getting this? just read the paper and not seeing it -- interested to learn more
The RAG GP used suffered from semantic collapse.
Gemini with Google search is RAG using all public data, and it isn't broken.
It's not tool use with natural language search queries? That's what I'd expect.
It's RAG via tool use, where the storage and retreival method is an implementation detail.
I'm not a huge fan of the term RAG though because if you squint almost all tool use could be considered RAG.
But if you stick with RAG being a form of "knowledge search" then I think Google search easily fits.
It is tool use with natural language search queries but going down a layer they are searched on a vector DB, very similar to RAG. Essentially Google RankBrain is the very far ancestor to RAG before compute and scaling.
Cant you make thresholds higher?
Hmm... I guess not, you might want all that data.
Super interesting topic. Learning a lot.