Comment by srameshc

2 months ago

I was thinking about intertwining Vector and Graph, because I have one specific usecase that required this combination. But I am not courageos or competent enough to build such a DB. So I am very excited to see this project and I am certainly going to use it. One question is what kind of hardware do you think this would require ? I am asking it because from what I understand Graph database performance is directly proportional to the amount of RAM it has and Vectors also needs persistence and computational resources .

7 comments

srameshc

GeorgeCurtis 2 months ago

The fortunate thing about our vector DB, like I mentioned in the post, is that we store the HNSW on disk. So, it is much less intense on your memory. Similar thing to what turbo puffer has done.

With regard to the graph db, we mostly use our laptops to test it and haven't run into an issue with performance yet on any size dataset.

If you wanna chat DM me on X :)

UltraSane 2 months ago

Neo4j supports vector indexes

GeorgeCurtis 2 months ago
Neo4j first of all is very slow for vectors, so if performance is something that matters for your user experience they definitely aren't a viable option. This is probably why Neo4j themselves have released guides on how to build that middleman software I mentioned with Qdrant for viable performance.
Furthermore, the vectors is capped at 4k dimensions which although may be enough most of the time, is a problem for some of the users we've spoken to. Also, they don't allow pre filtering which is a problem for a few people we've spoken to including Zep AI. They are on the right track, but there are a lot of holes that we are hoping to fill :)
Edit: AND, it is super memory intensive. People have had problems using extremely small datasets and have had memory overflows.
- mauvo59 2 months ago
  
  Hey, want to correct some of your statements here. :-)
  Neo4j's vector index uses Lucene's HNSW implementation. So, the performance of vector search is the same as that of Lucene. It's worth noting that performance suffers when configured without sufficient memory, like all HNSW vector indexes.
  >> This is probably why Neo4j themselves have released guides on how to build that middleman software I mentioned with Qdrant for viable performance.
  No, this is about supporting our customers. Combining graphs and vectors in a single database is the best solution for many users - integration brings convenience, consistency, and performance. But we also recognise that customers might already have invested in a dedicated vector database, need additional vector search features we don't support, or benefit from separating graph and vector resources. Generally, integrating well with the broader data ecosystem helps people succeed.
  >> Furthermore, the vectors is capped at 4k dimensions
  We occasionally get asked about support for 8k vectors. But so far, whenever I've followed up with users, there doesn't seem to be a case for them. At ~32kb per embedding, they're often not practical in production. Happy to hear about use cases I've missed.
  >> Also, they don't allow pre filtering which is a problem for a few people we've spoken to including Zep AI.
  We support pre- and post-filtering. We're currently implementing metadata filtering, which may be what you're referring to.
  >> AND, it is super memory intensive.
  It's no more memory-intensive than other similar implementations. I get that different approaches have different hardware requirements. But in all cases, a misconfigured system will perform poorly.
  
  3 replies →