← Back to context

Comment by tomcam

6 years ago

Your project sounds like fun. What/for whom is the project? What level of Excel formula compatibility are you shooting for, e.g. just in the neighborhood like Google Sheets, reasonably close as in Libre Office, or full compatibility with the latest version? If the latter you must have one gnarly test suite. Would love to hear more about it.

The compiler is for Calcapp, an app designer for Excel-savvy users. Our aim is to offer a function library that is easy to pick up for our target audience. That means that we generally follow Microsoft's lead in terms of naming and parameters, but we often offer extensions. Here are a few examples:

* Functions like SUMIF operate on a range or an array and use a textual mini-language to determine which values to operate on. =SUMIF(B1:B2, ">3"), where B1 contains 2 and B2 contains 4, returns 4, because only B2 satisfies the ">3" condition. We're not big fans of that mini-language (which expects you to use the string concatenation operator & to handle dynamic values). We support it, through an overloaded function, but also offer a version which takes a lambda as the second parameter. In Calcapp, SUMIF({ 2, 4 }, Item > 3) and SUMIF({ 2, 4 }, E -> E > 3) (with a named lambda parameter) are equivalent to SUMIF({ 2, 4 }, ">3"). The lambda syntax enables much more complex conditions to be expressed.

* Some functions like SORTBY take a sort order parameter, which is an integer where 1 means sort in ascending order and -1 means sort in descending order. We support functions with integer sort order parameters, but also offer an overloaded version taking an enumerated value instead (SortOrder.ASCENDING or SortOrder.DESCENDING), which we think helps with readability.

* Excel offers logical functions like AND, OR and NOT. We like the C-style operators &&, || and ! better, so we support them in addition to Excel's functions. https://www.calcapp.net/blog/2017/09/16/logical-operators.ht...

* Excel's logical operators (like =, <> and <=) return arrays when applied to arrays and ranges. For instance, { 1, 2 } = { 1, 3 } returns { TRUE, FALSE }. (This is easier to see in recent versions of Excel, where array results "spill," meaning that a return array with two elements spills to occupy two cells of the grid. In earlier versions, as well as Google Sheets, you'll have to use INDEX to tease out the results.) We support = and !=, but also == and !=, where the two latter operators always return scalar (non-array) values. == enables users to determine if two arrays are identical, without having to use AND on the result array (which is the standard Excel trick). Also, our == operator is more traditional in that string comparisons are case-sensitive (unlike =).

* Experienced Excel users like to use double negation to turn logical arrays into number arrays. We have a specialized operator, --, for that, because allowing two unary operators to follow one another would introduce other issues. Here's an example: https://exceljet.net/excel-functions/excel-sumproduct-functi...

* The Excel operators * and + can be used with logical arrays and ranges to signify logical AND, as well as logical OR, respectively. (https://exceljet.net/formula/sum-if-cells-contain-either-x-o...) We have overloaded versions of those operators that take logical arrays for that reason. (Again, Calcapp uses static typing, which makes this a lot more complicated for us than it is for Excel.)

* The latest versions of Excel have great support for dynamically-allocated arrays (along with new array functions like FILTER and SORT). Their new calculation engine supports users providing arrays where, traditionally, non-array parameter values are expected. If arrays are provided for these parameters, then an array is returned. For instance, =SQRT({ 4, 16 }) returns { 2, 4 }. Microsoft internally calls this "lifting" parameters. We refer to these parameters as being "array-capable." In our type system, an array-capable type is a union type (T|Array<T>), with the notable difference that if an array is provided to an array-capable parameter, then the return type is no longer T, but Array<T>.

We do have a test suite for the runtime formula function implementation with a couple of thousand unit tests. Given the number of supported formula functions, though, we should probably have many more.

Most of what I have described here pertains to a new Calcapp version, which hasn't been released yet. In particular, while I am done implementing the compiler, it still needs to be documented (lots of Javadoc), user-facing documentation needs to be written and the runtime system needs to be implemented, so we still have a few (fun!) months to go before everything is ready.

Thanks for taking an interest in what we do.

  • Looks like you’re doing everything right. That is just a fun project. Thank you so much for sharing.