Comment by olooney

14 hours ago

Yes, I'm planning out-of-the-box support for nomic[1] which can run in-process, and ollama which runs as a local server and supports many free embedding models[2].

[1]: https://www.nomic.ai/blog/posts/nomic-embed-text-v1

[2]: https://ollama.com/search?c=embedding

Project is super cool.

If you're adding more LLM integration, a cool feature might be sending the results of allow_many="left" off to an LLM completions API that supports structured outputs. Eg imagine N_left=1e5 and N_right=1e5 but they are different datasets. You could use jellyjoin to identify the top ~5 candidates in right for each left, reducing candidate matches from 1e10 to 5e5. Then you ship the 5e5 off to an LLM for final scoring/matching.