Comment by nullnix

10 years ago

I had... much too extensive experience both with SCCS weaves and with hacking them way back in the day; I even wrote something which sounds very like your smoosh, only I called it 'fuse'. However, I wrote 'fuse' as a side-effect of something else, 'fission', which split a shorter history out of an SCCS file by wholesale discarding of irrelevant, er, strands and of the history relating to them. I did this because the weave is utterly terrible as soon as you start recording anything which isn't plain text or which has many changes in each version, and we were recording multimegabyte binary files in it by uuencoding them first (yes, I know, the decision was made way above my pay grade by people who had no idea how terrible an idea it was).

Where RCS or indeed git would have handled this reasonably well (indeed the xdelta used for git packfiles would have eaten it for lunch with no trouble), in SCCS, or anything weave-based, it was an utter disaster. Every checkin doubled the number of weaves in the file, an exponential growth without end which soon led to multigigabyte files which xdelta could have represented as megabytes at most. Every one-byte addition or removal doubled up everything from that point on.

And here's where the terribleness of the 'every version takes the same time' decision becomes clear. In a version control system, you want the history of later versions (or of tips of branches) overwhelmingly often: anything that optimizes access time for things elsewhere in the history at the expense of this is the wrong decision.

When I left, years before someone more courageous than me transitioned the whole appalling mess to git, our largest file was 14GiB and took more than half an hour to check out.

The SCCS weave is terrible. (It's exactly as good a format as you'd expect for the time, since it is essentially an ed script with different characters. It was a sensible decision for back then, but we really should put the bloody thing out of its misery, and ours.)

Huh. Now I wonder how BK resolved this.

  • Yeah. I suspect the answer is 'store all binary data in BAM', which then uses some different encoding for the binary stuff -- but that then makes my gittish soul wonder why not just use that encoding for everything. (It works for git packfiles... though 'git gc' on large repos is a total memory and CPU hog, one presumes that whatever delta encoding BAM uses is not.)

    • We support the uuencode horror for compat (and for smaller binaries that don't change) but the answer for binaries is BAM, there is no data in the weave for BAM files.

      I don't agree that the weave is horrible, it's fantastic for text. Try git blame on a file in a repo with a lot of history then try the same thing in BK. Orders and orders of magnitude faster.

      And go understand smerge.c and the weave lightbulb will come on.

      4 replies →