Comment by narag
11 years ago
OK, there are so many advantages of text. I wholeheartedly agree and prefer plain text over most "smart" formats.
BUT.
I have this pet idea about source code. Text isn't the optimal format because a program is not a lineal thing, but closer to a tree structure.
Why have we settled on programs written in text? Pretty much for all the reasons you wrote and the fact that we've had bad experiences with other kind of formats in the past. Being able to fall back to plain text when things go wrong is very nice.
But it has its own sort of usability problems. There's an impedance mismatch between text and programs and sometimes it shows. Actually we don't see it more often because we tend to think that text is the way it is done, always was done and always will be done.
>Text isn't the optimal format because a program is not a lineal thing, but closer to a tree structure.
I can tell you what it's like from experience. The Interlisp D environment used a structure editor rather than a text editor. I found it infuriating and clumsy. Admittedly I had come from the emacs-infused PDP-10/Maclisp & Lispm world, so I gave it several months, but in the end I adapted an Emacs someone else had started and did all my editing in that.
I figure if this would work for any language it would be Lisp, and it didn't work for me. It sounds like a great idea, since if the editor's "buffer structure" is the program structure it's easy to write lambda functions to, say, support refactoring your code. But it was rarely convenient.
The other thing that didn't work for me was that it was of course a mouse-driven interface (this was PARC after all) and I found shifting my hand off the keyboard all the time slowed me down a lot too.
I found it infuriating and clumsy.
I've heard about programs that tried to do that before that had massive usability problems. Your comment seems to confirm that diagnostic.
I figure if this would work for any language it would be Lisp
I guess it would be more of a novelty for other languages.
I found shifting my hand off the keyboard...
That's a big no-no!
EDIT: By the way, I've been experiencing something similar recently, teaching Scratch and AppInventor to my son and a group of children at Coder-Dojo:
http://medialab-prado.es/article/coderdojo
Kids like the mouse interface, but I do find it very limiting and slightly infuriating.
Have you tried paredit mode for emacs?
It's already done in most IDEs and you probably use it :)
Ctrl+click on method invocation - did it jumped to the declaration?
Click on some method name and choose "show all invocations" or sth like that. Here you have one of the possible tree views of code.
Another tree-view of code is visible when you have code folding feature enabled.
Another one - when you debug and show subfields of some variable.
Yes, it's nice to have these additional abstractions over plain text. But these abstractions are inherently leaky, and I much prefer to work with text files than with some binary format to fix the leaks.
It's already done in most IDEs and you probably use it :)
That's a kludge. "True" form is text and the the IDEs add a layers on top. The right way is making the tree be the canonical form and leave text just as an interchange format.
But these abstractions are inherently leaky...
Inherently. That's bold and you give no justification. Anyway I like how you prove my last paragraph. It's been conventional wisdom for so long that you consider text representation as the fundamental form and using a format closer to the real structure a leaky abstraction.
The tree would just give you code-folding for free. Callgraphs, searching for references etc would still need to be recalculated by the code indexer. The only advantage is - no parsing step.
Meanwhile you would need to rewrite all the universal text-based tools from scratch specifically for your particular binary format. And this almost never happens so people are left with no way to merge files (Oracle Forms I'm looking at you).
BTW - merging and diffing files is when the abstraction often leaks, too, or at least wants to leak.
For example - when you have 2 trees with every node the same, but root node changed from <interface> to <class>. I guess your tree-based tool shows the whole tree as a difference? What about when you wrap half of the tree in <namespace>?
Textual diff would be 2 lines in both examples.
There are many possible situations, and I admit that in some tree-based approach is better, but there are many situations where you need better granularity than is possible without leaking the lower level. With text format that lower level is human-readable, and you can automerge unsafely (but concisely) and leave fixing the result to human. With binary format if you couldn't detect concise description of the change - you just show "everything was deleted and that new file was added" which isn't particulary helpful to the person that merge changes.
Can you merge word documents or databases? There's certainly market for that.
7 replies →
That's a kludge. "True" form is text and the the IDEs add a layers on top.
Why is text the "true" form? What's the difference between an IDE that uses text as the "true" form and one which uses a tree as the "true" form but uses text as the UI and serialization format?
4 replies →
That's why my favourite language does represent source code as trees…
That is, in Lisp the source is lists of lists and atoms, which are trees of data. It's pretty cool.
All languages actually represent source code as trees. It's just that the shavings are less obvious in other languages.
It would be nice to have an IDE that surfaced something like the AST. Of course lots do to some extent, but I bet there is room for improvement here. This also seems like a place where Lisp would have an advantage since the syntax is so transparent.
For readers of yesterday's article about whether all the "easy" stuff has already been accomplished, here is an easy-to-read survey of the state of the art in diffing trees:
http://useless-factor.blogspot.com/2008/01/matching-diffing-...
That sounds like a pretty interesting topic to research! And I note that his oldest citation is from 1997, and most are from the last ten years. He also briefly mentions "operational transformation," which I agree seems related and is another area of ongoing research. Both topics seem like they would have lots of practical applications, but right now the general-purpose tooling is weak or doesn't exist. So there is room not just for research but also for folks to implement that research.
Our ‘plain text’ is unfortunately not up to the task of representing all human written text; I'm thinking specifically of traditional mathematical notation, which is also a tree structure represented in two dimensions, and ancestral to programming notation, in that we first squeezed mathematical notation down to one dimension¹ and then augmented it with notations for control flow.
¹with a few forgotten exceptions like the Klerer-May system.
TeX, runoff, others.
> prefer plain text over most "smart" formats
To be usable and easily read, plain text needs word spacing and line breaking, which is a form of "smart" formatting.
> Why have we settled on programs written in text?
Again, text with newlines, indenting, and fixed width fonts, without which it can't be read. So we're talking 2D text here.
Totally agree! Plaintext is an unnormalized form of program code, and working in it generates all sorts of artificial problems. I've started various pet projects to try to be able to edit the AST naturally, but haven't seen much success yet. The UX is really difficult.
I think the Light Table team is trying to do this now with Eve, although it sounds like they are turning it into something even more revolutionary but further from textual visualization.
working in it generates all sorts of artificial problems.
Indeed. Escaping characters from comments or strings is one obvious example. Programmers serve the compiler instead of the other way around.
I've started various pet projects to try to be able to edit the AST naturally, but haven't seen much success yet. The UX is really difficult.
I've also been working in it some months ago, using SQLite and Lazarus. I hope I can recover the project now.
If you or someone else are interested, feel free to contact me by email, it's (for real) in the profile.
The solution for graph representation in a program is "program graph over text". The same way as web pages are described as "HTML over text".