Comment by freehorse

1 month ago

Why is the `[1,2,3] + [4;5;6]` syntax a footgun? It is a very concise, comprehensible and easy way to create matrices in many cases. Eg if you have a timeseries S, then `S - S'` gives all the distances/differences between all its elements. Or you have 2 string arrays and you want all combinations between the two.

The diag is admittedly unfortunate and it has confused me myself, it should actually be 2 different functions (which are sort of reverse of each other, weirdly making it sort of an involution).

What happens most of the time with unexperienced or distracted users is that they write things like `norm(S - T)` to compute how close two vectors are, but one of them is a row vector and the other is a column vector, so the result is silently completely wrong.

Matlab's functions like to create row vectors (e.g., linspace) in a world where column vectors are more common, so this is a common occurrence.

So `[1,2,3] + [4;5;6]` is a concise syntax for an uncommon operation, but unfortunately it is very similar to a frequent mistake for a much more common operation.

Julia tells the two operations (vector sum and outer sum) apart very elegantly: one is `S - T` and the other is `S .- T`: the dot here is very idiomatic and consistent with the rest of the syntax.

What does it even mean to add a 1x3 matrix to a 3x1 matrix ?

  • This is about how array operations in matlab work. In matlab, you can write things such as

        >> [1 2 3] + 1
        ans = [2 3 4]
    

    In this case, the operation `+ 1` is applied in all columns of the array. In this exact manner, when you add a (1 x m) row and a (n x 1) column vector, you add the column to each row element (or you can view it the other way around). So the result is as if you repeat your (n x 1) column m times horizontally, giving you a (n x m) matrix, do the same for the row vertically n times giving you another (n x m) matrix, and then you add these two matrices. So basically adding a row and a column is essentially a shortcut for repeating adding these two (n x m) matrices (and runs faster than actually creating these matrices). This gives a matrix where each column is the old column plus the row element for that row index. For example

        >> [1 2 3] + [1; 2; 3]
        ans = [2 3 4
               3 4 5
               4 5 6]
    

    A very practical example is, as I mentioned, getting all differences between the elements of a time series by writing `S - S'`. Another example, `(1:6)+(1:6)'` gives you the sums for all possible combinations when rolling 2 6-sided dice.

    This does not work only with addition and subtraction, but with dot-product and other functions as well. You can do this across arbitrary dimensions, as long as your input matrices non-unit dimensions do not overlap.

  • It means the same thing in MATLAB and numpy:

       Z = np.array([[1,2,3]])
       W = Z + Z.T
       print(W)
    

    Gives:

       [[2 3 4]
        [3 4 5]
        [4 5 6]]
    

    It's called broadcasting [1]. I'm not a fan of MATLAB, but this is an odd criticism.

    [1] https://numpy.org/devdocs/user/basics.broadcasting.html#gene...

    • One of the really nice things Julia does is make broadcasting explicit. The way you would write this in Julia is

          Z = [1,2,3]
      
          W = Z .+ Z' # note the . before the + that makes this a broadcasted
      

      This has 2 big advantages. Firstly, it means that users get errors when the shapes of things aren't what they expected. A DimmensionMismatch error is a lot easier to debug than a silently wrong result. Secondly, it means that julia can use `exp(M)` etc to be a matrix exponential, while the element-wise exponential is `exp.(M)`. This allows a lot of code to naturally work generically over both arrays and scalars (e.g. exp of a complex number will work correctly if written as a 2x2 matrix)