← Back to context

Comment by adsharma

1 day ago

That's right - it was a fun 2 out of 3 analogy.

The real question being raised in the blog post is - should the next generation graph databases pursue a local-only embedded strategy or build on top of object storage like many non-graph and vector embedded databases are doing.

Specifically, DuckLake (using system catalog for metadata instead of JSON/YAML) is interesting. I became aware of Apache GraphAr (incubating) after writing the blog post. But it seems to be designed for data interchange between graph databases instead of designing for primary storage.

I only mentioned it because I clicked it wondering if someone had found a way to "cheat" CAP for graph databases. When I saw that it was being used as an analogy and not literally, I figured I'd comment.

I still don't quite get the analogy. What are the 2 out of 3 that you can have? The second paragraph I quoted gives a classic 1 out of 2 dilemma - either scalable _or_ open-source.

  • DuckDB is scalable (can handle TPC-H 1TB) and open source, but doesn't support graphs natively. It supports some graph queries on a SQL native columnar storage.

    With the proposed solution, you'll be able to query larger graphs on an open source graph native engine. Thus beating the "CAP theorem for graphs".