Comment by AlotOfReading

6 hours ago

I'm not terribly familiar with graph databases, but perhaps someone who is can explain the advantage of this awfully complicated seeming design. There's gremlin, cypher, yjs, and zod, all of which I understand are different languages for different problems.

What's the advantage of using all these different things in one system? You can do all of this in datalog. You get strong eventual consistency naturally. LLMs know how to write it. It's type safe. JS implementations exist [0].

[0] https://github.com/tonsky/datascript

Gremlin-like API gives end to end type safety if you're querying the database from TypeScript. This was the original motivation for the library.

Zod/Valibot/ArkType/Standard Schema support because you need a way to define your schema and this allows for that at runtime and compile time.

Y.js as a backing store because I needed to support offline sync, branching/forking, and I use Y.js for collaborative editing in my product, so I needed to be able to store the various CRDT types as properties within the graph. e.g. you can have a `description` property on your vertices or edges that is backed by a Y.Text or Y.XmlElement

Cypher because until the arrival of codemode it wasn't feasible to have LLMs write queries using the Gremlin-like API and LLMs already know Cypher.

Most of all though, this was an experiment that ended up being useful.

The advantage for property graph databases using Cypher query language is that the queries for things like "show me all systems connected to this system by links greater than 10Gbps up to n hops away" are vastly easier to write and faster to complete compared to SQL and relational databases. Cypher lets you easily search for arbitrary graph patters and the result is also a graph, not a denormalized table.

  • Parent commenter was asking compare to datalog (not SQL) which eats recursive graph transitions like this for lunch, making the queries very elegant to read ... while still staying relational.

    I'm personally of the opinion that "graph databases" should be relational databases; the relational model can subsume "graph" queries, but for all the reasons Codd laid out back in the 60s... network (aka connected graph) databases cannot do the latter.

    Let the query planner figure out the connectivity story, not a hardcoded data model.

      % 1. Base case: Directly connected systems (1 hop) with   bandwidth > 10
      fast_path(StartSys, EndSys, 1) :- 
          link(StartSys, EndSys, Bandwidth), 
          Bandwidth > 10.
    
      % 2. Recursive case: N-hop connections via an intermediate system
      fast_path(StartSys, EndSys, Hops) :- 
          fast_path(StartSys, IntermediateSys, PrevHops), 
          link(IntermediateSys, EndSys, Bandwidth), 
          Bandwidth > 10,
          Hops = PrevHops + 1.
    
      % 3. The Query: Find all systems connected to 'System_A' within 5 hops
      ?- fast_path('System_A', TargetSystem, Hops), Hops <= 5.
    

    or in RelationalAI's "Rel" language, such as I remember it, this is AI assisted it could be wrong:

      // 1. Base case: Directly connected systems (1 hop)
      def fast_path(start_sys, end_sys, hops) =
        exists(bw: link(start_sys, end_sys, bw) and bw > 10 and hops = 1)
    
      // 2. Recursive case: Traverse to the next system
      def fast_path(start_sys, end_sys, hops) =
        exists(mid_sys, prev_hops, bw:
          fast_path(start_sys, mid_sys, prev_hops) and
          link(mid_sys, end_sys, bw) and bw > 10 and hops = prev_hops + 1)
    
      // 3. The Query: Select targets connected to "System_A" within 5 hops
      def output(target_sys, hops) =
        fast_path("System_A", target_sys, hops) and hops <= 5
    

    https://www.relational.ai/post/graph-normal-form

    https://www.dataversity.net/articles/say-hello-to-graph-norm...

    ...

    That said, modern SQL can do this just fine, just... much harder to read.

      WITH RECURSIVE fast_path AS (
        -- 1. Base case: Directly connected systems from our starting node
        SELECT
          start_sys,
          end_sys,
          1 AS hops
        FROM link
        WHERE start_sys = 'System_A' AND bandwidth > 10
        UNION ALL
    
        -- 2. Recursive case: Traverse to the next system
        SELECT 
          fp.start_sys, 
          l.end_sys, 
          fp.hops + 1
        FROM fast_path fp
        JOIN link l ON fp.end_sys = l.start_sys
        WHERE l.bandwidth > 10 AND fp.hops < 5
      )
    
      -- 3. The Query: Select the generated graph paths
      SELECT * FROM fast_path;

    • JOINS make these kinds of queries get slower as the number of hops gets larger. And property graph databases have the big advantage of not having to mutilate their query results to fit into a flat table. A path query returns a path object of connected nodes. Property graphs are superior for applications with deep, variable-length connections, such as social networks, recommendation engines, fraud detection, and IT network mapping. Property graph databases work well with object oriented programming where objects map to nodes very well.

      RelationalAI's model is very cool but it is cloud only software.