Comment by deepsquirrelnet

1 year ago

This is cool! How is the graph stored and queried? I’m familiar with graph databases, but I don’t see that as a dependency.

Have you tried the sciphi triplex model for extraction? I’ve tried to do some extraction before, but got inconsistent results if I extracted the chunks multiple times consecutively.

2 comments

deepsquirrelnet

liukidar 1 year ago

The graph is currently stored using python-igraph. The codebase is designed such that it is easy to integrate any graphdb by writing a light wrapper around it (we will provide support to stuff like neo4j in the near future). We haven't tried triplex since we saw that gpt4o-mini is fast and precise enough for now (and we use it not only for extraction of entities and relationships, but also to get descriptions and resolve conflicts), but for sure with fine tuning results should improve. The graph is queried by finding an initial set of nodes that are relevant to a given query and then running personalized pageranking from those nodes to find other relevant passages. Currently, we select the inital nodes with semantic search both on the whole query and entities extracted from it, but we are planning for other exciting additions to this method :)

katelatte 1 year ago

Suggestion: check out Memgraph for graph db storage - https://memgraph.com/. I work at Memgraph as DX Engineer so feel free to ping me in case you have questions about it: https://memgraph.com/office-hours
Your solution looks interesting and I would love to hear more about it. I haven't seen that many PageRank-based graph exploration tools.