← Back to context

Comment by gruseom

13 years ago

Editing complex functions in a spreadsheet is an absolute nightmare

Can you expand on why? Also, if you can imagine a spreadsheet that actually did a good job of this, what would it look like?

I can think of a few reasons, though they may not be the ones he has in mind.

There's the fact that you're dealing with cell references rather than variable names, so all of your expressions look like ($K4 - $S$1) rather than (principal - payment).

There's the fact that the IDE you're working in is trash -- rather than a text file with carefully indented parenthetical statements, it's a single line text field. Sort of like trying to code in a URL bar.

There's the fact that pieces of a program are often littered all over the spreadsheet, and it's hard to look at the whole thing at once.

There's the fact that it's extremely stateful -- the whole thing exists and depends on a table of values -- and when and where and how and in what order they're updated. From a programming perspective, everything is a global variable, and anything can update anything, and setting the state of those global variables is the only way for functions to return data or talk to each other.

I would say all of these are problems that Real Programming Languages have under control, so I can't say I worry too much about such an IDE descending into spreadsheet madness. I do wonder how you would show meaningful realtime results, though, without running a program from the top.

  • > There's the fact that you're dealing with cell references rather than variable names, so all of your expressions look like ($K4 - $S$1) rather than (principal - payment).

    In Excel, use the "Name Manager" dialog and the "Name Box" on the formula bar. They re somewhat hidden, but discovery of them forever changed my spreadsheets!

    • Even better, stick the name at the end of the row or column, select the entire range and press alt-i n c. Good luck discovering that since they introduced the ribbon bar, though.

  • Oh come on, spreadsheets aren't that bad. Excel has had named ranges forever, and they can be scoped to the worksheet so they aren't completely global.

    http://office.microsoft.com/en-us/excel-help/define-and-use-...

    And functions can be VBA so the state of those global variables isn't the only way for functions to talk to each other. e.g.

        Dim x As Integer
    
        Function setx(n)
            x = n
            setx = 1
        End Function
    
        Function getx()
            getx = x
        End Function
    

    and then you can put =setx(200) in one cell and =getx() in another. It is hard to look at the whole thing at once, but when do you need to do that?

  • Excel + VBA solves a lot of that. The spreadsheet part just becomes a view.

    Somebody should really build a web app version of that, there are millions of custom Excel+VBA spreadsheets spread throughout businesses across the world.

    The only way they will migrate online is either through custom web apps (I used to do a lot of those) or with a generic solution which doesn't exist yet.

    • To migrate them online, do you think what's needed is a single-click "just upload your Excel spreadsheet and it will all work, including the VBA"? That is, how important is 100% compatibility, including VBA? (The technical bar for that is pretty high.)

      1 reply →

  • "There's the fact that you're dealing with cell references rather than variable names"

    Excel has named ranges - allows you to give a meaningful name to single cell or a range of cells - a pretty widely used feature.

    "it's a single line text field"

    Excel's formula editor can be as large as you want.

  • > all of your expressions look like ($K4 - $S$1) rather than (principal - payment).

    Numbers.app has the nice feature (among others) to use column/row headers to name cells/colums/rows in formulas in a readable manner.

  • Back in the NeXT days there were a bunch of variable based spreadsheets. The stars were Quantrix and Lotus Improv.

    Data and formulas were kept completely separate so you could change them at will.

  • Those reasons are spot on. Also errors are difficult to track down and spreadsheets have horrible code re-use.

    • Oh yeah. I knew there was one I forgot.

      You get compiler errors about as helpful as trying to debug C++ macros -- namely, something is wrong with this big glob of code, but it ain't gonna tell you what or where.

Spreadsheets would be nicer if they had a coherent dataflow model. Some spreadsheet programs do this automatically, but it could be a great place for a visual programming language too.

One ugly pattern that I'm often doing is to set a column with values by the function "A3=A2" and dragging that down. This gives a column of constant values which are tweakable by tweaking just the first element. That's what a scalar variable looks like in a spreadsheet. I'd like a separate calculation area where I could refer to these. Also when I select a column I might right-click and choose "Reduce → Mean", and the mean of that column could appear in the scalar area too. With some visual tinkering you might even have a sort of "iframe" containing the columns and the scalars at the bottom of the page, and have the mean appear underneath the column to remind me what it's the mean of -- but it should float; I shouldn't have to move it when I want to drag-down the calculation to enter in a few more rows.

With spreadsheet logic you have a couple of atomic operations like, "this is a fixed column specified by me," or "this is a column containing every half-int from 12.5 through 28.5" and so on. You have reductions of those list types, like Mean and Sum and Length. You have transformations of the data too, like NormalDist(x, mean, stdev). And some of them are cached-recursive, e.g. the running-sum function you might use to find a balance given a transaction history:

    balance[-1] = 0
    balance[i] = change[i] + balance[i - 1]

Finally you've got a wide variety of visualizations of that data, which might also be linked from the "scalar area" -- in fact it might be nice to develop ultrasmall "thumbnail versions" that update dynamically as the data updates.

I think those elements are sort of the "core" of a spreadsheet and are handled woefully inadequately by Excel, which was not originally designed for the popular usage case it has become.

  • Spreadsheets would be nicer if they had a coherent dataflow model.

    What do you mean by "coherent dataflow model"? (I'm working on these problems - hence all the questions.)

    This gives a column of constant values which are tweakable by tweaking just the first element. That's what a scalar variable looks like in a spreadsheet.

    Why not just put the value in a cell and reference that cell absolutely?

    With some visual tinkering you might even have a sort of "iframe" [...] but it should float

    It seems that the general solution here would be (a) make it much easier to decompose a problem across multiple sheets (in your example, scalars and reductions could go in a different sheet), and (b) allowing sheets to "float" if you want them to, rather than always being in a different tab that you're forced to switch to. Does that make sense?

    balance[i] = change[i] + balance[i - 1]

    It's interesting that you single out this kind of recurrent calculation, where a later value of a column depends on an earlier value. It doesn't get mentioned very often. But it's fundamental to what spreadsheets do computationally and is the reason why parallelizing them is a lot harder than at first appears.

    • (1) You're right, I should have explained that more. I wanted to contrast this with something like "spreadsheets today have a datum-flow model" but a nice phrasing didn't come to me and I posted without remembering to insert the missing sentence.

      What I mean is that spreadsheets are (right now) fundamentally based on the idea of a grid of cells which are individually meaningless and can contain anything, any individual datum, and datums may refer to each other by arbitrary operations. This grid view might be a good way to present datums to users but it requires a style convention when you want to write it to be readable; it encourages styles which obscure your ability to actually see what this sheet does.

      It's not just that you can't see how the data flows, although that's part of it -- it's that the data is allowed to flow in ways that you could never easily visualize in the first place. Imagine that we simply draw the "depends on" relation by drawing a little curvy arrow from A to B if B depends on A. The Excel equivalent of "spaghetti code" could then literally look like spaghetti on the spreadsheet -- it would have neither head nor tail.

      This could be solved with a nice model for how data, not individual datums, are allowed to flow through the application. Calculating a velocity might be as simple as writing "(x - last(x))/(t - last(t))", if x and t accepted vector subtractions and last(q)[i] == q[i - 1].

      (2) I'm not entirely sure what you think the referring code is doing, if not putting the values in cells and referencing those cells. The reason why I can't be "absolute" about it is because in Calc (and Excel the last time I used it), to extend a computation over a vector, you highlight the computation and then click in a resizing corner to resize it into an area parallel to the input vectors -- or else you use some right-click "Fill" tool.

      I used to think that these tools were broken but I think I can now appreciate that, because their model is so easily grasped, it's not really a break if it's hard to say, "no! I wanted this parameter fixed.

      (3) That sounds suspicious. mean(v) should be associated with the column v in a clear way.

      2 replies →

    • Maybe by "coherent dataflow model" he means some separation between what's an input to any given operation, so you can see what data is going where and how instead of it being scattered all over the place. That's what I'd like, anyway.

      What it looks like to me is more like a program than a table, but with really good list/table entry and flow arrangement tools. That may just be because I'm a programmer.

      Can you share anything more about what you're working on? These are interesting problems to me, too.

      5 replies →