Comment by cmrdporcupine

5 hours ago

It'd going to get slower as the number of hops get bigger regardless of whether you accomplish that via relational join or pointer traversal. And doing it the former gives great opportunity for optimization during execution through vectorization or more appropriate index data structure usage.

Modern computers are also much more efficient at batch / vector processing than they are pointer hops. By either CPU or GPU, the whole system is much more tuned for working with data as sets / tensors rather than "hopping" through the data via pointer.

How you materialize your results for processing is your own business. The advantage of the relational model is its consistent throughout; the source data, the manipulation format for the operators, and the output are all relations. You can visualize it as "flat table" but that's an unimaginative visualization. You can just as easily twist that into a nested hierarchical structure in an object oriented language if you're so unfortunate as to be stuck working that way.

A "table" is only a visualization of the data, not the "form" the data actually takes and it's unfortunate that SQL chose this word.

It's better to conceive of a relation as a series of facts or propositions about the world. Each "row" is a statement. When read that way, it's a lot more elegant.

Neo4j-style index-free adjacency gives you O(1) per-hop neighbor lookup per node; cost is proportional to the subgraph actually visited. A k-way self-join on a relational edge table has to produce and filter intermediate tuples at each join step, and intermediate result size can blow up multiplicatively even if the final result is small. A k-way self-join on a relational edge table has to produce and filter intermediate tuples at each join step, and intermediate result size can blow up multiplicatively even if the final result is small. For deep traversals with selective endpoints, the relational plan's intermediate materialization is exactly the problem native graph engines were built to avoid.

You can just as easily see nodes and edges in a property graph as propositions about the world. The nice thing is that you can model relationships between entities as first class entities. nodes have the implicit property of being non-fungible.

Do you know of any relational database that returns a query result as normalized tables the way neo4j returns a sub-graph?

  • Honestly, I won't convince you, so I'll leave you to the pleasure of your pointer chasing, L1 cache misses, multiplicative blowups, and fixed / inflexibly structured taxonomies.

    • In practice if you index the right properties and have the right relationships in Neo4j then queries are very fast. And neo4j doesn't have taxonomies. You might be thinking of RDF. One of the best things about Neo4j compared to relational databases is the lack of a rigid schema.