Comment by jltsiren

3 months ago

When there is no clear canonical way of implementing something, adding it to a programming language (or a standard library) is risky. All too often, you realize too late that you made a wrong choice, and then you add a second version. And a third. And so on. And then you end up with a confusing language full of newbie traps.

Graphs are a good example, as they are a large family of related structures. For example, are the edges undirected, directed, or something more exotic? Do the nodes/edges have identifiers and/or labels? Are all nodes/edges of the same type, or are there multiple types? Can you have duplicate edges between the same nodes? Does that depend on the types of the nodes/edges, or on the labels?

1 comment

jltsiren

WorldMaker 3 months ago

Even the raw storage for graphs doesn't have just one answer: you could store edge lists or you could store adjacency matrixes. Some algorithms work better with one, some work better with the other. You probably don't want to store both because that can be extra memory overhead as well as a locking problem if you need to atomically update both at once. You probably don't want to automatically flip back and forth between representations because that could cause garbage collector churn if not also long breadth or depth searches, and you may not want to encourage manual conversions between data structures either (to avoid providing a performance footgun to your users). So you probably want the edge list Graph type and the adjacency matrix Graph type to look very different, even though (they are trivially convertible they may be expensive to convert as mentioned), and yeah that's the under-the-hood storage mechanism. From there you get into possible exponential explosion as you start to get into the other higher level distinctions between types of graphs (DAGs versus Trees versus cyclic structures and so forth, and all the variations on what a node can be, if edges can be weighted or labeled, etc).