Comment by samuelstros
17 days ago
> But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.
There is nuance between git line by line diffing and what lix does.
For text diffing it holds true that diffing is a separate layer. Text files are small in size which allows on the fly diffing (that's what git does) by comparing two docs.
On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.
What lix does under the hood is tracking individual changes, _which allows rendering a diff without on the fly diffing_. So lix is kind of responsible for the diffs but only in the sense that it provides a SQL API to query changes between two states. How the diff is rendered is up to the application.
> On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.
I don’t think that’s actually true?
How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?
I’ve worked with some tools that can diff images. Works great. Not a problem in need of solving.
In any case I’ll give benefit of the doubt that this project solves some real problem in a useful way. I’m not sure what it is.
My goals in a VCS for binary files seem to be very very very different than yours.
I think our goals indeed differ.
> How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?
If version control is embedded in an app, constantly.
Imagine a cell in a spreadsheet. An application wants to display a "blame" for a cell C43 i.e. how did the cell change over time?
The lix way is this SQL query
SELECT * from state_history WHERE file_id <the_spreadsheet> AND schema_key "excel_cell" AND entity_id C43;
Diffing on the fly is not possible. The information on what changed needs to be available without diffing. Otherwise, diffing an entire spreadsheet file for every commit on how cell C43 changed takes ages.