Comment by deegles

5 years ago

What if the document starts empty and syncing doesn't happen until everyone presses submit? Will it CRDTs produce a valid document? Yes. Will it make any sense? Who knows. I think that's what OP is getting at.

I read it as a question regarding OT vs. CRDTs, which I believe would produce similar results even under heavy concurrency. In terms of larger edits or refactors, you’d probably need to do something else, e.g. lock the document or section, unshare the document, use some sort of higher-level CRDT that ships your changes atomically and forces a manual merge on concurrent edits, etc. None of these necessarily require a central server, though they may require an active session between participants.

I should also note that even if you use regular merge, and the end state of a text document is a complete mess after a refactor + concurrent edits, there’s enough data in the tree to simply pull out any concurrent contributions. They could then be reapplied manually if needed. Perhaps the app could even notice this automatically and provide an optional UI for this process. Similarly, it would be possible for the concurrent editors to remove the refactor edits and thus “fork” their document.

  • My question was not meant to be OT versus CRDT. Rather, I am questioning expectations at that shared editing use case.

    Comparing to git (as others have done) is interesting. The expectation is any merge is manually tested by the user. Such that it is not just the git actions at play, but all support activity. That is, the user flow assumes all intermediate states are touched and verified by a user. Where this is skipped, things increase the risk of being broken. (Is why git bisect often fails projects that don't build every commit.)

    Same for games. Some machine gets to set the record straight as to what actually happened. Pretty much always. The faster the path to the authority for every edit, the higher chance of coherence.

    With hundreds of authorities, machine or not, this feels intractable.

    • I think CRDTs could compose nicely with a system that features a central authority or stronger semantics. For example, you could imagine a sequence of vector clocks, each a superset of the last, stored in a separate data structure (and maybe hosted in a central location) that serves as the "main branch". The CRDT part would work as before, but any attempt to "commit" a vector clock that's concurrent with the top of "main" would be rejected. As in your example, every commit would be vetted by a human, and you'd get the best of both worlds.

      (But I think this would be unnecessary for most instances of real-time collaboration, since people tend to own and edit small portions of a document instead of jumping around and making drastic revisions all over the place. In fact, it should be very rare for two changes to actually be concurrent, unless an offline version of a document is being merged in. I would agree that "100 people editing the same document offline and merging periodically" is a less than ideal use of CRDTs, but I think they could offer many benefits even in this scenario, especially if paired with a central ledger as described above.)