Comment by ajuc

11 years ago

The tree would just give you code-folding for free. Callgraphs, searching for references etc would still need to be recalculated by the code indexer. The only advantage is - no parsing step.

Meanwhile you would need to rewrite all the universal text-based tools from scratch specifically for your particular binary format. And this almost never happens so people are left with no way to merge files (Oracle Forms I'm looking at you).

BTW - merging and diffing files is when the abstraction often leaks, too, or at least wants to leak.

For example - when you have 2 trees with every node the same, but root node changed from <interface> to <class>. I guess your tree-based tool shows the whole tree as a difference? What about when you wrap half of the tree in <namespace>?

Textual diff would be 2 lines in both examples.

There are many possible situations, and I admit that in some tree-based approach is better, but there are many situations where you need better granularity than is possible without leaking the lower level. With text format that lower level is human-readable, and you can automerge unsafely (but concisely) and leave fixing the result to human. With binary format if you couldn't detect concise description of the change - you just show "everything was deleted and that new file was added" which isn't particulary helpful to the person that merge changes.

Can you merge word documents or databases? There's certainly market for that.

Callgraphs, searching for references etc would still need to be recalculated by the code indexer.

The what? Sorry, I really don't know what you're talking about. If you want to introduce a reference, logical thing to do is checking it in the same moment.

The only advantage is - no parsing step.

There are many more, see my other reply to icebraining.

About tools, you seem to give more importance to diff than to write programs with a more powerful environment. I can't agree. Also I don't think creating a suitable diff for trees is a hard problem. I just need to convince Linus of the idea and he will write one in a couple of days :-)

  • Oh, my bad, so you write references into the sourcefiles - smart. Your code isn't a tree anymore, but a graph, but other than that it will work.

    Of course you still need code indexers for reverse lookups. You can't write "I am called from a,b,c,..." to source files because printf would be several gigabytes and growing each second:) So you still need code indexer to look up the code graphs in your workspace and build the index where this is called from.

    Regarding the diff/merge tooling - IMHO it's much more important than powerful editor. If my company forced me to write code in mcedit I could live with that. I would quit straight away if we had no version control, no matter the IDE we would use. I worked for a short time in oracle forms and I won't do that again.

    Re: merging of trees - you keep references in the trees so it's not a tree anymore, but graph possibly with cycles - makes it even harder to merge them properly (and you can't do unsafe merges because people won't fix them if it's binary format).

    • you keep references in the trees so it's not a tree anymore, but graph possibly with cycles

      Cycles, why? Tree+links can be treated as a tree visually.

      Edit: Oh, I see what you meant with indexer. That information would be in memory. In the database it's stored in just one direction. When loaded it expands bidirectionally.

      1 reply →

The key to merging trees is that your editor generates the diff on the fly as you write—your editor's undo/redo stack and your version control system become one and the same. Every time you add, move, copy, switch out, or delete a node, the editor notes the operation on your delta log. When you push your code externally, you're really just shipping the log.

With these semantic deltas, merging should be highly unambiguous and automatable, even in degenerate cases.

  • Hm, the history will be dirty with abandoned experiments etc, but so what - great idea. I'd like to use that system for a while, but I'm still not sure if it will work.

    I remember using jbpm graph language - it had nice graphical editor, but we still switched to xml view to very often, because it was much faster to work with text.

    Maybe it's just a matter of proper tools, but I can't imagine how you allow graphic language to do this for example:

    sed s/foo\("bar", ([a-Z0-9]+), "baz"\)/foo\(\1\); bar\(\1\); baz\(\1\);/g

    Maybe it's problem with my imagination.

    • When I look at that line, my brain sees its structure, the different layers... :) It is totally representable without restoring to plaintext. With good fundamentals (ADTs) and good UX (an editor that looks like a text editor but is so much more) we can make it work.