← Back to context

Comment by UltraSane

8 hours ago

The advantage for property graph databases using Cypher query language is that the queries for things like "show me all systems connected to this system by links greater than 10Gbps up to n hops away" are vastly easier to write and faster to complete compared to SQL and relational databases. Cypher lets you easily search for arbitrary graph patters and the result is also a graph, not a denormalized table.

Parent commenter was asking compare to datalog (not SQL) which eats recursive graph transitions like this for lunch, making the queries very elegant to read ... while still staying relational.

I'm personally of the opinion that "graph databases" should be relational databases; the relational model can subsume "graph" queries, but for all the reasons Codd laid out back in the 60s... network (aka connected graph) databases cannot do the latter.

Let the query planner figure out the connectivity story, not a hardcoded data model.

  % 1. Base case: Directly connected systems (1 hop) with   bandwidth > 10
  fast_path(StartSys, EndSys, 1) :- 
      link(StartSys, EndSys, Bandwidth), 
      Bandwidth > 10.

  % 2. Recursive case: N-hop connections via an intermediate system
  fast_path(StartSys, EndSys, Hops) :- 
      fast_path(StartSys, IntermediateSys, PrevHops), 
      link(IntermediateSys, EndSys, Bandwidth), 
      Bandwidth > 10,
      Hops = PrevHops + 1.

  % 3. The Query: Find all systems connected to 'System_A' within 5 hops
  ?- fast_path('System_A', TargetSystem, Hops), Hops <= 5.

or in RelationalAI's "Rel" language, such as I remember it, this is AI assisted it could be wrong:

  // 1. Base case: Directly connected systems (1 hop)
  def fast_path(start_sys, end_sys, hops) =
    exists(bw: link(start_sys, end_sys, bw) and bw > 10 and hops = 1)

  // 2. Recursive case: Traverse to the next system
  def fast_path(start_sys, end_sys, hops) =
    exists(mid_sys, prev_hops, bw:
      fast_path(start_sys, mid_sys, prev_hops) and
      link(mid_sys, end_sys, bw) and bw > 10 and hops = prev_hops + 1)

  // 3. The Query: Select targets connected to "System_A" within 5 hops
  def output(target_sys, hops) =
    fast_path("System_A", target_sys, hops) and hops <= 5

https://www.relational.ai/post/graph-normal-form

https://www.dataversity.net/articles/say-hello-to-graph-norm...

...

That said, modern SQL can do this just fine, just... much harder to read.

  WITH RECURSIVE fast_path AS (
    -- 1. Base case: Directly connected systems from our starting node
    SELECT
      start_sys,
      end_sys,
      1 AS hops
    FROM link
    WHERE start_sys = 'System_A' AND bandwidth > 10
    UNION ALL

    -- 2. Recursive case: Traverse to the next system
    SELECT 
      fp.start_sys, 
      l.end_sys, 
      fp.hops + 1
    FROM fast_path fp
    JOIN link l ON fp.end_sys = l.start_sys
    WHERE l.bandwidth > 10 AND fp.hops < 5
  )

  -- 3. The Query: Select the generated graph paths
  SELECT * FROM fast_path;

  • JOINS make these kinds of queries get slower as the number of hops gets larger. And property graph databases have the big advantage of not having to mutilate their query results to fit into a flat table. A path query returns a path object of connected nodes. Property graphs are superior for applications with deep, variable-length connections, such as social networks, recommendation engines, fraud detection, and IT network mapping. Property graph databases work well with object oriented programming where objects map to nodes very well.

    RelationalAI's model is very cool but it is cloud only software.

    • It'd going to get slower as the number of hops get bigger regardless of whether you accomplish that via relational join or pointer traversal. And doing it the former gives great opportunity for optimization during execution through vectorization or more appropriate index data structure usage.

      Modern computers are also much more efficient at batch / vector processing than they are pointer hops. By either CPU or GPU, the whole system is much more tuned for working with data as sets / tensors rather than "hopping" through the data via pointer.

      How you materialize your results for processing is your own business. The advantage of the relational model is its consistent throughout; the source data, the manipulation format for the operators, and the output are all relations. You can visualize it as "flat table" but that's an unimaginative visualization. You can just as easily twist that into a nested hierarchical structure in an object oriented language if you're so unfortunate as to be stuck working that way.

      A "table" is only a visualization of the data, not the "form" the data actually takes and it's unfortunate that SQL chose this word.

      It's better to conceive of a relation as a series of facts or propositions about the world. Each "row" is a statement. When read that way, it's a lot more elegant.

      3 replies →