← Back to context

Comment by duped

2 days ago

How are the subsets defined?

At every round of the algorithm, all nodes with 0 in-degree (i.e. they are not depending on anyone) are collected as a dependence-free subset.

They serve as the root set to the rest of the graph for the current round. The depending nodes reached from root set have their in-degree decremented. When their in-degrees reach 0, they are added to the next root set.

I'm using double-buffering to maintain the current root set for processing and to collect the next root set for the next round, instead of using a queue as in Kahn's algorithm. At the end of the round, I simply swap the double-buffers. It's very efficient. When the next root set is empty, all nodes have been processed.

  • Do you define "in degree" as the number of incoming edges from nodes that have not been visited/sorted yet?

    I believe what you've implemented is equivalent to Kahn's algorithm. Your double buffer is the queue.

    • It's a variant to the Kahn's algorithm. Here's the description in the comment on the function.

        // This is a variant to the Kahn's algorithm, with additions on
        // finding dependence-free subsets and finding the cyclic nodes.
        //
        // This algorithm iteratively finds out the root sets of the graph.
        // 1. Find the first root set of the graph.
        // 2. Remove the nodes of the root set from the graph.
        // 3. Find the next root set. Go to 2 until the graph is empty.
        // A root set consists of nodes depending on no other nodes, i.e.
        // nodes whose incoming link count is 0.
      

      The successive root sets form a topological order.

      The in-degree of a node is the count of the incoming links of the leading nodes that the node is depending on. The in-degree can change as other nodes are removed from the graph.

      Kahn's algorithm uses a queue to hold the pending nodes and works on one node at a time. My insight is that it's simpler to treat the root nodes as sets, and to partition the root nodes into the current root set and the next root set, which work nicely with the double-buffer mechanism. As a benefit, the nodes in a root set are dependence-free with regard to the current round and can be used for parallel processing.