← Back to context

Comment by necovek

9 days ago

I agree that files solve some rudimentary cases, but they do not even allow simple conflict resolution. Eg. compressed files, including container formats like OpenOffice (text files in a ZIP archive IIRC), might be simple to apply changes from two sides if they are in distant parts, but syncing full files simply barfs.

Note that this does not even need two users: I hit this problem with a desktop and laptop and self-hosted NextCloud myself.

In general, a filesystem that actually stored both raw data (to fail-over to), but also a per-format event log, and maybe even app specific events (imagine a PNG changes, we could have any change recorded as raw bytes, generic bitmap image operation like "modify pixels at x,y to ..." and app-specific log like "gimp: apply sharpen filter on polygon area ...").

This would allow the other side to attempt to do the smartest sync it has (if it has a compatible version of gimp, it could decide to apply the filter, otherwise fall back to raw pixel changes if no conflicts, and then fall back to full file contents reconciliation).

Just like MIME handlers get registered, if file systems provided such change logs, some could have very advanced sync systems with this support from "filesystems".

The log is just a block of data. All the burden to use the log is on the application so the OS is providing very little general functionality.

I’m also suspect of logs as a general form of conflict resolution as you are just hoping the two edits don’t touch the same area. And if they do then you are left in an invalid state.

You brought up zips. Pile of files seems like a way you can divide up data so it can have more pieces that are mergable/diffable.

For example “the folder can contain N files” or “there must be exactly 1 of this file”.

  • Log is certainly a blob of data, but the point is that it should be more granular, with clearer delineation of what are and what aren't conflicting changes: there will always be conflicting changes where no automation can really help.

    For zip and other container-type files, you'd have log entries to the tune of "changed contained file foo.png: ...".

    Operating systems would need to support some basic operations: container file operations like zip files, basic bitmap image editing, basic text document diffing, structured text diffing (XML, JSON, YAML...), etc...

    Applications would provide OS-registered services (like MIME handlers are registered today) that can interpret and produce even more semantic events on top of the existing ones.

    The environment could offer an interface during "syncing" when it detects a conflict to resolve it using one of the generic (or not) conflict resolution mechanisms (use local or remote version completely; use incremental delta if there is some generic semantic diff; app-provided capability if present on both sides).

    Now, you are right that this can be implemented with this log being a regular file next to the file itself and completely user-space, but you will hit issues if you are not able to tie it nicely to things like fwrite/fsync and similar syscalls.

    Obviously, for it to make sense, it needs to be widely accepted as the approach, which is what the local-first movement is trying to achieve with CRDTs.