← Back to context

Comment by Agentlien

4 years ago

I remember clearly when I first gained a real appreciation for the importance of measuring and considering your actual data.

I was trying to optimize a slow project doing some polyhedron dissection on large (~1M polyhedrons) meshes in real-time. As a first attempt I had my sights set on a function which tried to find a certain point on a surface by making an educated guess and then iteratively bumping its estimate in the right direction. I figured "I can solve this analytically instead!".

I did, and it made things approximately twice as slow. After scratching my head and outputting some data I realized my constant time solution required a number of expensive operations including sqrt and the iterative approach ran on average ~1 iteration and at most 3 iterations for just a handful of these millions of triangles.

> I realized my constant time solution required a number of expensive operations including sqrt and the iterative approach ran on average ~1 iteration

Yeah! It's deeply unsatisfying to forego algorithmic optimization but it often isn't necessary.

EG, when I do Advent of Code, I often write a naive algorithm that works correctly on the sample input, and then runs a really long time on the actual problem input. Before I go redo the algorithm, I often run it with PyPy instead of the standard Python, and 80% of the time it terminates quickly enough.

Similarly, whenever I happen to do AoC in C++, my naive algorithms happen to magically be fast enough out of the box so I don't even think about better algorithms. It's unfortunately counter to the intuition that you get from a CS programs that obsess with Big-O notation rather than implementation speed.

Big-O is really important but it seems like the input size where it starts to dominate implementation is much larger than our intuition suggests.

(of course it's also easy to write really bad algos that suck even on small input)

  • Unfortunately this was a case where the naive algorithm wasn't fast enough so we needed to optimise. But this particular part of it wasn't the solution.

    We instead ended up changing our data format in a way which reduced cache misses.