Comment by forrestthewoods

16 days ago

Weird sales pitch. I think Git is super mediocre and a VCS that supports binary files would be awesome.

But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.

> But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.

There is nuance between git line by line diffing and what lix does.

For text diffing it holds true that diffing is a separate layer. Text files are small in size which allows on the fly diffing (that's what git does) by comparing two docs.

On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.

What lix does under the hood is tracking individual changes, _which allows rendering a diff without on the fly diffing_. So lix is kind of responsible for the diffs but only in the sense that it provides a SQL API to query changes between two states. How the diff is rendered is up to the application.

  • > On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.

    I don’t think that’s actually true?

    How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?

    I’ve worked with some tools that can diff images. Works great. Not a problem in need of solving.

    In any case I’ll give benefit of the doubt that this project solves some real problem in a useful way. I’m not sure what it is.

    My goals in a VCS for binary files seem to be very very very different than yours.

    • I think our goals indeed differ.

      > How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?

      If version control is embedded in an app, constantly.

      Imagine a cell in a spreadsheet. An application wants to display a "blame" for a cell C43 i.e. how did the cell change over time?

      The lix way is this SQL query

      SELECT * from state_history WHERE file_id <the_spreadsheet> AND schema_key "excel_cell" AND entity_id C43;

      Diffing on the fly is not possible. The information on what changed needs to be available without diffing. Otherwise, diffing an entire spreadsheet file for every commit on how cell C43 changed takes ages.

Most version control systems that are not Git support binary. In the industry you most often see Perforce P4 and Subversion being used for that purpose.

  • Correct. Perforce is expensive AF and is also kinda meh. They got bought by private equity and haven’t meaningfully improved it for like 15 years. But they’ve got gamedevs by the balls who don’t have an alternative. It’s unfortunate.