Comment by PaulDavisThe1st

5 years ago

We want something like that for distributed collaboration in Ardour, a cross-platform DAW. The relevant state is serialized as an XML file, and used natively as a complex object tree. Users want to be able to edit (in the DAW) locally, then share (and merge) their results with collaborators.

I've plugged this collaboration project a few times recently, and have no relationship to it other than discovering it (via YJS' "who is using" list[1]) and finding it fascinating:

http://cattaz.io/

What I find most interesting about it is that it has reduced the state of multiple 'smart' user-facing widgets/apps into a common, lowest-common-denominator format (a text document) that lends itself more easily and intuitively to collaborative editing and CRDT operations.

I don't know for sure whether this is the path forward for CRDT-based applications in general, but I think there are valuable ideas there. It does raise the possibility of the widgets/applications occasionally being in 'invalid' states; but rarely in a way that the human participants wouldn't notice or be able to fix themselves.

Whether that scales to the complexity of the state management for a multi-track audio editing session, I don't know; but it could be instructional to compare.

[1] - https://github.com/yjs/yjs#who-is-using-yjs

in real-time? Well, I have thoughts, but I'm not super familiar with Ardour itself, so I'm not sure if you're trying to merge during a live performance or if you're talking more of a distributed studio recording session type situation. I have working knowledge of Reason and Logic and ChucK (which I use with JACK and do some networked OSC stuff with, although I haven't touched it in a few years).

The approach we use at Jackbox for making the state of an xbox game mutable to thousands of live viewers on twitch is to have lots of little CRDT values, mostly just counters and sets of strings, and you merge the little values independently of one another, which is very different from the situation of editing a text document, which is typically structured as one big value. I wonder if, for a DAW, you could merge at the track or control level instead of the workspace level. E.g., communicate as an independent value the state of an individual fader, and communicate either states or operations on that fader and have each client merge them. In this example, the fader's state would be encoded as a PN-counter with a range condition, and you'd replicate increment and decrement options, like it was a networked rotary encoder. So every mutable thing in the DAW would be a value having operations that can be merged, instead of having a single big value representing the entire state of the DAW. My use-case is also funky because I have potentially thousands of writers writing to the same key, but only a single reader, and the reader doesn't need an update at every change, so I use state-based CRDTs, but I think most other people using CRDTs use operation-based CRDTs. Also not sure how you would mutate two separate values transactionally or if that's a thing you even need.

  • Not realtime. Users would sync periodically during their working process.

    There are lots of mutable things in a DAW that are not numeric parameters.

    The state of a playlist (just think "some time ordered list of objects") is not treatable in the same way as the value of a fader.

    If you had a context-aware XML parser and access to timestamps for every XML node, you could do the human-aided merge by considering each node and just using the latest version of the same node, falling back to the human when there's a deletion conflict (for example). But this doesn't actually merge the attributes of a node or deal with conflicts between otherwise sequential edits.