Comment by praptak

4 hours ago

> Not if the data is small and in cache.

Isn't it another way of saying what the author says in the previous paragraph, namely that "ideal SIMD speedup can only come from problems that are compute bound"?

If the cost of getting the input data into the cache is already large compared to processing it with the non-vectorized code, then SIMD cannot achieve meaningful speedup. The opposite of this condition (processing is expensive compared to the cost of data into the cache) is basically the definition "compute bound".