Comment by geokon

2 days ago

Yeah, I guess partition to me always looks like a dangerous tool - for instance I have a sequence of numbers and I want to do a 5 point rolling average

You could do `(partition 5 1 coll)` and then average each element in the resulting seq.. It's very easy to reason about. But I'm guessing the performance will be abysmal? You're getting a lazy list and each time you access a 5 neighbor set.. you're rerunning down you coll building the 5 unit subsets? Maybe if you start with an Array type it'll be okay, but you're always coercing to seq and to me it's hard

Taking the first 5 elements, recurring on a list with the top element dropped is probably better, but I find the code hard to read. Maybe it's a familiarity issue..

7 comments

geokon

barrell 2 days ago

Yeah like I said I reach for loop first and foremost. This is what it would look like with comments if it were actually something complicated (although the comments are quite trivial here):

  (loop [coll [1 2 3 4 5 6 7 8 9] memo []]
    (if (< (count coll) 5) memo ;; Stop once there are less than 5 items
      (->> (take 5 coll)        ;; Take the first 5 days
           (reduce +)           ;; Sum them
           (* 0.20)             ;; Divide by 5
           (conj memo)          ;; Append to the end of the memo
           (recur (rest coll))  ;; Recur for remaining elements
           ,,,))))

Realistically if performance was a consideration I would probably do:

  (loop [[a b c d e :as coll] [1 2 3 4 5 6 7 8 9] memo []]
    (if-not e memo
      (recur (rest coll) (conj memo (/ (+ a b c d e) 5))))))

Should be ~15 times faster to avoid the nested loop. If you want to change the min size it's still pretty clean:

  (loop [[a b c d e :as coll] [1 2 3 4 5 6 7 8 9] memo []]
    (if-not c memo
      (recur (rest coll) (conj memo (/ (+ a b c d e) 
                                       (cond e 5 d 4 c 3))))))))

geokon 2 days ago
Thank you for the detailed response! Really thought provoking. It wouldn't occur to me to write code like this. It seems like it'd be harder to parse than an imperative index-based solution, but I'm not sure. Do you find it easy to immediately grok? I'm figuring it's just familiarity
- What's the nested loop in the first solution that you've avoided? The `reduce`? the `count`?
- `conj` feels very Lispy (I mean in contrast to Clojure, not C++) .. Isn't it going to have to run down the list every time to add an item?
My outstanding concerns are what I think are the constant coercion to lists/vectors. You also in effect know the result's size, but your runtime/compiler doesn't know that. So you aren't preallocating `memo` and it feels .. wrong haha
Just curious to hear your thoughts :)
Its probably impossible to keep everything so nicely abstract and composable, but I wish it was smoother to just work with arrays, with random access. The current way of dealing with array is always a bit unwieldy in Clojure. And everything coerced to lists. Working with vectors, with mapv filterv etc is helpful, but they don't have random access so it's not always the solution you want.
- barrell 1 day ago
  
  > It seems like it'd be harder to parse than an imperative index-based solution, but I'm not sure. Do you find it easy to immediately grok?
  These examples are incredibly easy to grok. It takes me a lot longer to grok any index based solution ;-) Especially handling nils, missing indices, checking bounds... you quickly get buried in obvious issues that aren't a concern in clojure.
  If it's more complicated, I use comments like the first example. I would say that when grokking more complicated loops, you definitely read it slower on a per character basis, but as I've stated elsewhere I don't think you're grokking the functionality any slower.
  > What's the nested loop in the first solution that you've avoided? The `reduce`? the `count`?
  Correct, reduce is a loop
  > `conj` feels very Lispy (I mean in contrast to Clojure, not C++) .. Isn't it going to have to run down the list every time to add an item?
  No, vectors are closer to hash-maps than lists. There are two main sequential collections in clojure, lists and vectors. Lists are O(n) to append, vectors are O(1). Lists ore O(1) to prepend, I'm not actually sure the perf characteristics of prepending to a vector.
  > My outstanding concerns are what I think are the constant coercion to lists/vectors.
  There is only one coercion from a vector to a list in the code (the first time `rest` is called on the input vector). That's if your input was a vector to begin with. Chances are it will be a list. Also, I'm not sure it's even technically coercion in the computer science sense.
  > You also in effect know the result's size, but your runtime/compiler doesn't know that. So you aren't preallocating `memo` and it feels .. wrong
  This is not a concern in JavaScript (it always preallocates a bunch of memory for every vector IIRC) and I'm not sure the Java implications, but given the performance people get out of clojure I'm sure it's not an issue ;-)
  > The current way of dealing with array is always a bit unwieldy in Clojure. And everything coerced to lists. Working with vectors, with mapv filterv etc is helpful, but they don't have random access so it's not always the solution you want.
  I can count on one hand the number of times I use nth in my project. To me it's crazy that people build solutions based on random access XD
  That being said random access in O(1) on vectors, including those returned from filterv and mapv (technically O(log32n))
  
  3 replies →
geokon 2 days ago

I thought another option would be to use a `partition-all` transducer (and conj'ing a result list that way), but it seems those don't support step sizes. Maybe that internally could be as efficient as running along iterators in an imperative style
https://ask.clojure.org/index.php/2422/transducer-for-partit...
https://ask.clojure.org/index.php/13187/there-currently-opti...
https://clojure.atlassian.net/browse/CLJ-2797
Hopefully gets added to the language some day..