Comment by dennis16384
2 days ago
Why would an LLM want to look into the contents, what for?
We have low-cardinality data and yes this is safe to share and required to build an actual query.
Then we have high-cardinality and possibly PII - there’s absolutely no reason to share that data, there’s nothing for LLM to analyse there. Also semantic index (vector search) will find relevant records much faster and more accurately that any chain-of-thoughts just with an LLM-authored search fn call.
Further there are continuous numerical values and there’s not much LLM needs to see in there either. We can say, for example, if you look at data distributions when building your analysis, it can drive your analysis logic, but another point of view here is taht it creates unnecessary bias instead.
On re-read I think I might have overreached in my reply. I think having local LLMs being able run tool loops to _transform_ data, rather than just summary or analysis, will become 1/ great for non-technical users, 2/ fast.