Comment by RangerScience
5 years ago
Interesting. I might be adding real-time edit syncing to a hobby project sometime soon. Can you share more about the trade-offs?
5 years ago
Interesting. I might be adding real-time edit syncing to a hobby project sometime soon. Can you share more about the trade-offs?
I haven't yet completely watched Martin's talk on CRDTs, so I might come back and stand corrected. For now these are some well known trade-offs
A central server: Most OT algorithms depend on a central system for intention preservation. CRDTs are truly distributed and need no central server at all.
Memory: Traditionally CRDTs consume more memory because deletions are preserved. OT lets you garbage collect some operations since a central system is already recording those ops and sequencing them as well.
Analysing and cancelling ops: OT lets you easily analyse incoming ops and modify/dummy-ify/cancel them without breaking the consistency. This convenience is not necessary for most cases, but really important for rich-text editing. For example when someone merges a couple of table cells when another user is deleting a column, we need to analyze these operations and modify them so as not to end-up with an invalid table structure.
For building P2P / decentralized apps, could OT be used with some kind of leader election system?
Absolutely. And as a next step, you distribute your OT per document so that no one of them has too much load. This is exactly how Google docs work and for the most part it works well.
The downside is that if a single document gets a spike of traffic because it goes viral, then that document won't scale.
1 reply →
Seems like another one (based off the article) is ease of use as well. I'm not familiar with either algorithm, but sounds like OT is less complex and easier to understand, which IMO is a decent tradeoff worth considering.
Having worked a little with both, my impression is that OT can get very complex in implementation edge cases. CRDTs are incredibly difficult to design, but if you successfully design one which can model your features, implementation is pretty straightforward.
A real world implication is that if you want to add a new operation to a system (like, table column merge, or moving a range of text), with OT, you can probably find a way to extend what you have to get it in there, with a painfully non-linear cost as you add more new operations. With CRDTs, you may find yourself entirely back at the drawing board. But the stuff you do support, you will support pretty well and reliably...
Personally, I prefer CRDTs for their elegance, but it can be difficult in a world of evolving requirements
2 replies →
I agree complexity is worth considering, though part of me wonders how important that is in this case. The reason for this intuition is that this is one of core parts of what they’re selling.
If you’re going to invest your complexity budget somewhere, it seems like this is a good place for companies dealing with these structures.
Dealing with text is still an active area of research for CRDTs. While the problem has been theoretically solved, the solutions require much more memory/bandwidth than OT does.[1] Conversely, CRDTs are significantly better at replicating graphs.
yjs[2] is one CRDT that handles text reasonably well, but it can still run into performance edge cases (as they plainly/honestly admit in their README).
[1]: https://github.com/automerge/automerge/issues/89 [2]: https://github.com/yjs/yjs
Have you watched the presentation referenced in the article? CRDTs have come a long way. You can get down to about a 150% increase in size over a plain text representation. Since most rich text formats store the edit history anyway, this is closer to feasible than the numbers might suggest.
The transform operation is more simple if you know the order of things. For example in OT: nr2) Delete H from index 0. nr1) Insert "Hello" at index 0. You know that nr1 should come before nr2 because of a central counter. But with CRDT it's a) Delete character id 0, b) Insert "Hello" at character with id 0.
I think intention preservation requires that you know what the clients know about. So you still need something like a vector clock to model what the dependency relationship between nr2 and nr1 are.
Yes, in OT the clients can send what they think is the global counter. Then each client or the server can transform the edit operation from the edit history eg. move the index left/right. In CRDT each index is instead an ID and deleted characters has to be saved as "tomestones" so you know where they where.