Comment by deepsquirrelnet

3 days ago

This is cool! How is the graph stored and queried? I’m familiar with graph databases, but I don’t see that as a dependency.

Have you tried the sciphi triplex model for extraction? I’ve tried to do some extraction before, but got inconsistent results if I extracted the chunks multiple times consecutively.

The graph is currently stored using python-igraph. The codebase is designed such that it is easy to integrate any graphdb by writing a light wrapper around it (we will provide support to stuff like neo4j in the near future). We haven't tried triplex since we saw that gpt4o-mini is fast and precise enough for now (and we use it not only for extraction of entities and relationships, but also to get descriptions and resolve conflicts), but for sure with fine tuning results should improve. The graph is queried by finding an initial set of nodes that are relevant to a given query and then running personalized pageranking from those nodes to find other relevant passages. Currently, we select the inital nodes with semantic search both on the whole query and entities extracted from it, but we are planning for other exciting additions to this method :)