Comment by nonameiguess

2 months ago

Ironically enough, I wrote a Sieve in NumPy 12 years ago or so in grad school messing around with Project Euler. I haven't touched it since and have not regularly used either NumPy or even Python in quite some time, but I still have the solutions in a private Github repo. The code is:

  def primes(n):
      """Input: A natural number n.
         Output: An array of primes, p < n."""
      assert n >= 2
      sieve = np.ones(n / 2, dtype=np.bool)
      for i in range(3, int(n ** 0.5) + 1, 2):
          if sieve[i / 2]:
              sieve[i * i / 2::i] = False
      return np.r_[2, 2 * np.nonzero(sieve)[0][1::] + 1]

It still relies on looping and I have no idea if you can fully vectorize everything, but it does at least get quite a bit of speedup thanks to the ability to index one array using another, so I can avoid the inner loop you have to use in pure Python.

I have absolutely no clue what that final line is doing and I'm not sure I understood it 12 years ago, either.

1 comment

nonameiguess

gjm11 2 months ago

sieve[i] is nonzero precisely when 2i+1 is an odd prime or i=0 (because it's initialized to all-1 and nothing ever zeros the first element).

So np.nonzero(sieve) is all the indices for which i=0 or 2i+1 is an odd prime.

Except that the behaviour of np.nonzero is a bit weird: it returns a tuple of arrays, one for each axis. (So, e.g., if you feed it a 2d array it gives you (a,b) such that the nonzero elements are at positions (a[0],b[0]), (a[1],b[1]), etc.) That's kinda reasonable behaviour, but it means that for the simple case of a 1-d array you get a 1-element tuple.

So np.nonzero(sieve)[0] is actually all the indices i for which i=0 or 2i+1 is an odd prime, and now you know why I didn't write "all the indices i for which ..." above.

The case i=0 would correspond to 1 being prime, which it isn't considered to be. So we throw that one out.

So np.nonzero(sieve)[0][1::] is all the indices i for which 2i+1 is an odd prime.

And so 2*np.nonzero(sieve)[0][1::]+1 is all the 2i+1 for which 2i+1 is an odd prime; that is, all the odd primes.

I don't think I've encountered np.r_ before, but it's pretty obvious what it's doing: prepending 2 to that array, so now it's "2 followed by all the odd primes" or, more simply, all the primes.

(Obviously, in the above phrases like "all the primes" mean "all the primes up to the limit defined by the size of the array".)