Comment by GeorgeCurtis

6 months ago

Kuzu don't support incremental indexing on the vectors. The vector index is completely separate and decoupled from the graph.

I.e: You have to re-index all of the vectors when you make an update to them.

2 comments

GeorgeCurtis

wontonaroo 6 months ago

Firstly congratulations on your effort.

How does the graph component of your database perform compared to Kuzu? Do you have any benchmarks.

For RAG I've tried Qdrant, Meilisearch, and Kuzu. At the moment I wouldn't consider HelixDB because of HelixQL. Wondering why you didn't use OpenCypher?

At the moment you have this system which is aimed to support AI/LLM systems but by creating HelixQL you do not have an AI coding friendly query language.

With OpenCypher even older cheap models can generate queries. Or maybe some GraphQL layer.

GeorgeCurtis 6 months ago

Thanks for the support :)
We're currently working on benchmarks so nothing exact on Kuzu right now with regards to performance. We've had quite a few requests for benchmark comparisons against different databases, so they should take a good few days. Will return here when they are ready
When we've used Cypher in the past we didn't get on with the methodology of the language that well. A functional approach, like gremlin, suited our minds better. But, Gremlin's syntax is awful (in our opinion), and the amount of boilerplate code you need we felt was unnecessary.
We wanted to make something that was easier to read than Gremlin, like Cypher, but also have functional aspect that just made traversals feel so much more intuitive.
Another note, we're more fond of type-safe languages, and it didn't make much sense to us that out of all the programming languages that exist, query languages were the non-type-safe ones.
We know it's a pain learning a new language, but we really believe that our approach will pave the way for a better development experience and a better paradigm.
Onto the AI stuff, you're right, it isn't ideal (right now). We did make a gpt wrapper that did a pretty good job of writing queries based on a condensed version of our docs, but this isn't ideal. So, the next thing on our road map is a graph traversal MCP tool. Instead of the agent having to generate text written queries, it can use the traversal tools and determine where it should hop to at each step.
We know we're being quite ambitious here, but we think there's a lot we can improve on over existing solutions.
Thanks again :)