Comment by barishnamazov
12 hours ago
I like that this relies on generating SQL rather than just being a black-box chat bot. It feels like the right way to use LLMs for research: as a translator from natural language to a rigid query language, rather than as the database itself. Very cool project!
Hopefully your API doesn't get exploited and you are doing timeouts/sandboxing -- it'd be easy to do a massive join on this.
I also have a question mostly stemming from me being not knowledgeable in the area -- have you noticed any semantic bleeding when research is done between your datasets? e.g., "optimization" probably means different things under ArXiv, LessWrong, and HN. Wondering if vector searches account for this given a more specific question.
This is the route I went for making Claude Code and Codex conversation histories local and queryable by the CLIs themselves.
Create the DB and provide the tools and skill.
This blog entry explains how: https://contextify.sh/blog/total-recall-rag-search-claude-co...
It is a macOS client at the present but I have a Linux-ready engine I could use early feedback on if anyone is interested in giving it a go.
I don’t have the experiments to prove this, but from my experience it’s highly variable between embedding models.
Larger, more capable embedding models are better able to separate the different uses of a given word in the embedding space, smaller models are not.
I'm using Voyage-3.5-lite at halfvec(2048), which with my limited research, seems to be one of the best embedding models. There's semi-sophisticated (breaking on paragraphs, sentences) ~300 token chunking.
When Claude is using our embed endpoint to embed arbitrary text as a search vector, it should work pretty well cross-domains. One can also use compositions of centroids (averages) of vectors in our database, as search vectors.
I was thinking about it a fair bit lately. We have all sorts of benchmarks that describe a lot of factors in detail, but all those are very abstract and yet, those do not seem to map clearly to well observed behaviors. I think we need to think of a different way to list those.