Comment by qwertyuiop924

10 years ago

I though SCCS had the same problems as RCS. What did it do differently?

35 comments

qwertyuiop924

RCS is patch based, the most recent version is kept in clear text and the previous version is stored as a reverse patch and so on back to the first version. So getting the most recent version could be fast (it isn't) but the farther back you go in history the more time it takes. And branches are even worse, you have to patch backwards to the branch point and then forwards to the tip of the branch.

SCCS is a "weave". The time to get the tip is the same as the time to get the first version or any version. The file format looks like

  ^AI 1
  this is the first line in the first version.
  ^AE 1

That's "insert in version 1" data "end of insert for version one".

Now lets say you added another line in version 2:

  ^AI 1
  this is the first line in the first version.
  ^AE 1
  ^AI 2
  this is the line that was added in the second version
  ^AE 2

So how do you get a particular version? You build up the set of versions that are in that version. In version 1, that's just "1", in version 2, that's "1, 2". So if you wanted to get version 1 you sweep through the file and print anything that's in your set. So you print the first line, get to the ^AI 2 and look to see if that's in your set, it isn't, so you skip until you get to the ^AE 2.

So any version is the same time. And that time is fast, the largest file in our source base is slib.c, 18K lines, checks out in 20 milliseconds.

nullnix 10 years ago
I had... much too extensive experience both with SCCS weaves and with hacking them way back in the day; I even wrote something which sounds very like your smoosh, only I called it 'fuse'. However, I wrote 'fuse' as a side-effect of something else, 'fission', which split a shorter history out of an SCCS file by wholesale discarding of irrelevant, er, strands and of the history relating to them. I did this because the weave is utterly terrible as soon as you start recording anything which isn't plain text or which has many changes in each version, and we were recording multimegabyte binary files in it by uuencoding them first (yes, I know, the decision was made way above my pay grade by people who had no idea how terrible an idea it was).
Where RCS or indeed git would have handled this reasonably well (indeed the xdelta used for git packfiles would have eaten it for lunch with no trouble), in SCCS, or anything weave-based, it was an utter disaster. Every checkin doubled the number of weaves in the file, an exponential growth without end which soon led to multigigabyte files which xdelta could have represented as megabytes at most. Every one-byte addition or removal doubled up everything from that point on.
And here's where the terribleness of the 'every version takes the same time' decision becomes clear. In a version control system, you want the history of later versions (or of tips of branches) overwhelmingly often: anything that optimizes access time for things elsewhere in the history at the expense of this is the wrong decision.
When I left, years before someone more courageous than me transitioned the whole appalling mess to git, our largest file was 14GiB and took more than half an hour to check out.
The SCCS weave is terrible. (It's exactly as good a format as you'd expect for the time, since it is essentially an ed script with different characters. It was a sensible decision for back then, but we really should put the bloody thing out of its misery, and ours.)
- qwertyuiop924 10 years ago
  
  Huh. Now I wonder how BK resolved this.
  
  6 replies →
caf 10 years ago
Presumably if you then delete that first line in the third version, you get something like
^AI 1 this is the first line in the first version. ^AE 1 ^AD 3 ^AI 2 this is the line that was added in the second version ^AE 2 ?
- luckydude 10 years ago
  
  Close. By the way there is a bk _scat command (sccs cat, not poop) that dumps the ascii file format so you can try this and see.
  The delete needs to be an envelope around the insert so you get
  ^AD 3 ^AI 1 this is the first line in the first version. ^AE 1 ^AE 3 ^AI 2 this is the line that was added in the second version ^AE 2
  That whole weave thing is really cool. The only person outside of BK land that got it was Braam Cohen in Codeville, I think he had a weave.
  
  5 replies →
qwertyuiop924 10 years ago

That actually is pretty neat
clacke2 10 years ago
Aha, so that's where bzr got it from. :-)
- luckydude 10 years ago
  
  bzr got more than that from BK, it got one of my favorite things, per-file checkin comments. I liken those to regression tests, when you start out you don't really value them but over time the value builds up. The fact that Git doesn't have them bugs me to no end. BZR was smart enough to copy that feature and that's why MySQL choose bzr when they left BK.
  The thing bzr didn't care about, sadly, is performance. An engineer at Intel once said to me, firmly, "Performance is a feature".
  
  11 replies →

pklausler 10 years ago

In short, RCS maintains a clean copy of the head revision, and a set of reverse patches to be applied to recreate older revisions. SCCS maintains a sequence of blocks of lines that were added or deleted at the same time, and any revision can be extracted in the same amount of time by scanning the blocks and retaining those that are pertinent.

Really old school revision control systems, like CDC's MODIFY and Cray's clone UPDATE, were kind of like SCCS. Each line (actually card image!) was tagged with the ids of the mods that created and (if no longer active) deleted it.

rbsmith 10 years ago
| CDC's MODIFY and Cray's clone UPDATE, were kind of like SCCS
Do you have references? I've heard of these but haven't come across details after much creative searching since they are common words.
- pmcjones 10 years ago
  
  See http://www.bitsavers.org/pdf/cdc/cyber/software/ .
  
  1 reply →
- luckydude 10 years ago
  
  I've heard that too. It comes from card readers somehow.