Comment by liukidar

3 days ago

This is super interesting! Thanks for sharing. Here we are talking of graphs in the milions nodes/edges, so efficiency is not that big of a deal, since anyway things are gonna be parsed by a LLM to craft an asnwer which will always be the bottleneck. Indeed PageRank is the first step, but we would be happy to test more accurate alternatives. Importantly, we are using personalized pagerank here, meaning we give specific intial weights to a set (potentially quite large) of nodes, would TC support that (as well as giving weight to edges, since we are also looking into that)?

> Here we are talking of graphs in the milions nodes/edges,

That ought to be enough for anybody.

> would TC support that

TC is a purely structural algorithm, it counts triangles so it doesn't take any weights into consideration, but it does return a vector of normalized ranking from 0.0 to 1.0, which you could combine with an existing biasing strategy to boost results that have strong centrality.

  • Hah indeed, we are doing billion-scale real-time graph rag in louie.ai for fairly regular tasks, so your sentiment resonates ;-)

    For something like uploading a big folder of documents, agree with the OP, pretty straightforward, naive in-memory with out-of-the-box embeddings, LLMs, retrieval, and untuned DBs goes far. I expect most vector-supporting dbaas and LLMaaS to be offering in the new year. OpenAI, Claude, and friends are already going in this direction, leaving the rag techniques opaque for now.

    (Something folks may not appreciate, and I think is important about what's being done here, is the incremental update aspect.)