Valori – A Python-native Vector Database I built from scratch

2 days ago

I’ve been working on a project called Valori, a Python-native vector database I built from the ground up — not by reinventing every algorithm, but by wiring together efficient, well-known indexing and search techniques into a cohesive, hackable framework.

The idea came from my frustration with existing vector DBs that were either too heavy for experimentation or too opaque to modify. I wanted something simple, modular, and extensible — so I built it.

What it does:

Lets you store, index, and search high-dimensional vectors

Supports multiple indices (Flat, HNSW, IVF, LSH, Annoy)

Has memory, disk, and hybrid storage backends

Includes a full document processing pipeline (parsing, cleaning, chunking, embedding)

Offers quantization, persistence, and plugin-based extensibility

All written in Python, integrated with NumPy, and production-tested with logging and monitoring built in.

Install:

pip install valori

GitHub: https://github.com/varshith-Git/valori

PyPI: https://pypi.org/project/valori

I’d love to hear your thoughts —

What’s missing for you in current vector DBs?

If you’ve built LLM or RAG systems, what do you wish a lightweight, pure Python DB like this handled better?

Would you prefer tighter integrations (LangChain, Haystack, etc.) or a more “build-it-yourself” style?

Feedback, criticism, or collaboration ideas are all welcome. — Varshith (varshith.gudur17@gmail.com )

how much was this vibe coded? looks cool but its too much for me to digest.

where did you get the original mental model to begin building it?

  • It’s definitely dense, but not as wild as it looks. The mental model was: take the core building blocks from FAISS and Milvus, make them composable in Python, and expose everything clearly.

    The “vibe” part came from trying to make it feel like a system that could run in production, not just a toy. So yeah, it’s a little heavy, but it earned the vibe honestly.

What’s the advantage if this being in python?

  • The point isn’t raw speed it’s hackability. You can plug in new models or indexing layers in minutes without dropping to C++.

  • I think the “simple, modular, and extensible” makes this interesting. And for those, it being written in Python are relevant.

    • Exactly Python makes the whole stack composable instead of compiled shut. That’s where the fun (and flexibility) lives.

dude you already missed the window.

nothing is better than sqlite as a library and don't use high perforamnce as your value for a python product

  • SQLite’s perfect if you’ve got rows and tables. Valori’s for when you’ve got embeddings and chaos.