Comment by adsharma
1 day ago
PGQ requires you to write using SQL and read using a graph query language. GQL is a standalone language that supports reads/writes. But much of the community is still using cypher.
https://adsharma.github.io/beating-the-CAP-theorem-for-graph...
As far as I can tell, this has nothing to do with CAP theorem or distributed systems. It's just being used as an analogy.
> [CAP theorem] states that any distributed storage system can provide only two of these three guarantees: Consistency, Availability and Partition safety.
> In the realm of graph databases, we observe a similar “two out three” situation. You can either have scalable systems that are not fully open source or you can have open source systems designed for small graphs. Details below.
(the article follows)
> This is one solution to the CAP theorem for graphs. We can store a billion scale graph using this method in parquet files and use a free, cheap and open source solution to traverse them, perform joins without storage costs that are prohibitively high.
That's right - it was a fun 2 out of 3 analogy.
The real question being raised in the blog post is - should the next generation graph databases pursue a local-only embedded strategy or build on top of object storage like many non-graph and vector embedded databases are doing.
Specifically, DuckLake (using system catalog for metadata instead of JSON/YAML) is interesting. I became aware of Apache GraphAr (incubating) after writing the blog post. But it seems to be designed for data interchange between graph databases instead of designing for primary storage.
I only mentioned it because I clicked it wondering if someone had found a way to "cheat" CAP for graph databases. When I saw that it was being used as an analogy and not literally, I figured I'd comment.
I still don't quite get the analogy. What are the 2 out of 3 that you can have? The second paragraph I quoted gives a classic 1 out of 2 dilemma - either scalable _or_ open-source.
1 reply →