← Back to context

Comment by GermanJablo

2 hours ago

> But with some optimisation work, CRDTs perform better than OT based systems.

I read your paper and I think this is a mistake. You assume that OT has quadratic complexity because you're considering classic operation-based OT. But OT can be id-based, in which case operations are transformed directly on the document, not on other operations. This is essentially CRDT without the problems of supporting P2P, and therefore the best CRDT will never perform better than the best OT.

> CRDTs let you scale your backend. OT (usually) requires server affinity. CRDT based systems are way more flexible. Personally I'd rather complex code and simpler networking than the other way around.

All productivity apps that use these tools in any way shard by workspace or user, so OT can scale very well.

If you don't scale CRDT that way, by the way, you'd be relying too much on "eventual consistency" instead of "consistency as quickly as possible."

> (For anyone who's read the papers, I'm conflating OT == the old Jupiter-based OT algorithm that's popular in Google Docs and others.)

Similar to what I said before. I think limiting OT to an implementation that’s over three decades old doesn’t do OT justice.

> I think limiting OT to an implementation that’s over three decades old doesn’t do OT justice.

I haven't kept up with the OT literature after a string of papers turned out to "prove correctness" in systems which later turned out to have bugs. And so many of these algorithms have abysmally bad performance. I think I implemented an O(n^4) algorithm once to see if it was correct, but it was so slow that I couldn't even fuzz test it properly.

> You assume that OT has quadratic complexity because you're considering classic operation-based OT. But OT can be id-based, in which case operations are transformed directly on the document, not on other operations.

If you go down that road, we can make systems which are both OT and CRDT based at the same time. Arguably my eg-walker algorithm is exactly this. In eg-walker, we transform operations just like you say - using an in memory document model. And we get most of the benefits of OT - including being able to separately store unadorned document snapshots and historical operation logs.

Eg-walker is only a CRDT in the sense that it uses a grow-only CRDT of operations, shared between peers, to get the full set of operations. The real work is an OT system, that gets run on each peer to materialise the actual document.

> This is essentially CRDT without the problems of supporting P2P, and therefore the best CRDT will never perform better than the best OT.

Citation needed. I've published plenty of benchmarks over the years from real experiments. If you think I'm wrong, do the work and show data.

My contention is that the parts of a CRDT which make them correct in P2P settings don't cost performance. What actually matters for performance is using the right data structures and algorithms.