Comment by exDM69
2 days ago
Nice and concise description of quadtrees implemented with classical pointer chasing data structures.
A faster and (arguably) simpler way to construct quad/octrees is using morton codes, sorting and searching with flat arrays [0]. It's also probably easier to implement.
The gist of it is that you quantize the coordinates, do bit interleaving to construct morton code and then sort. The sorting is using numerical keys so you can use a radix sort for O(n) complexity which is much faster on a GPU (but on single CPU core a comparison based sort will probably win if the array isn't huge).
Now everything on the top half of the space is in the beginning of the array and the bottom half is in the end of the array (assuming yxyxyxyx morton code bits). Each of those halves is then split left/right and then each level below that alternates between vertical and horizontal splits.
To find the split point in the array, look at the first and last entry of the (sub)array you're looking at, and look for the first differing bit with (first ^ last).leading_zeros(). Then binary search for the first entry where that bit is high.
To traverse the quad/octree, repeat this process with the two halves you found. This can be done without recursion in O(1) memory using a fixed size stack because you know the depth of the "recursion" is at most half the number of bits in the morton code.
If you used radix sorting for building the array, you can avoid the binary search if you store the histograms from the counting phase of the sorting. Storing the whole histogram may be too much, but just a few highest order bits can already help.
Although I've found experimentally that for small (less that 10000 objects) just sorting and searching is faster if the whole array fits in L2 cache. On my 2015 laptop a single core can sort 10k objects with 64 bit keys in 1 millisecond. Traversing the tree for frustum culling is about 5x faster than just going through the entire array because a lot of geometry can be discarded very quickly.
With a good comparison based sort rebuilding the array after some changes is mighty fast because modern sorting algorithms are much faster for "almost sorted" inputs.
For range queries ("Find everything in the region" in the article), you can probably get better performance by using the BIGMIN/LITMAX method [1].
Now here's a brain teaser to delight your Friday: the article and the method I describe above is for storing points/centroids, but often you need to store axis aligned boxes instead (AABB). There is a very clever trick for this that I discovered independently but later found in some research papers. Can you come up with a (very) small change in the algorithm above to extend it to AABBs instead of centroids?
[0] Karras: Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees - https://research.nvidia.com/sites/default/files/pubs/2012-06... [1] https://en.wikipedia.org/wiki/Z-order_curve#Use_with_one-dim...
Discretize the AABB with a certain length proportional to the morton code grid size so that each sample point will land in a distinct but continuous sequence of morton codes enclosing the AABB, compute the hash of each of those points, then dump all of those morton codes into the sort and reverse lookup the other AABBs with the same morton code :)
The problem comes at scale, when you have many AABBs with different sizes. Memory pressure can be punishing and you are bottlenecked by the network for distributed solutions. Is there a faster way with fewer points? I would love to know!
Good attempt, but it's simpler than that.
For each AABB, you take the morton code of the minimum and maximum and store them in the array.
But for sorting and searching, construct a sort key from the pair of morton codes so that the most significant bits are the common prefix of the two morton codes and the least significant bits are the "level" (length of common prefix in bits).
E.g. for a 3d octree with 64 bit keys, you'd have 1 unused bit, 57 = 19 * 3 bits of morton code prefix and 6 bits of level. A 2d quadtree with 32b keys would have 1 unused bit, 26 = 2 * 13 bits of prefix and 5 bits of level.
You could also store just the sort key, but I've stored both AABB morton codes so that I can reconstruct the quantized AABBs for culling.
When you sort by this key, you'll get a linear octree where the beginning of the array is all the AABBs that overlap the subdivision plane, then everything on the left side of the plane and the rest is right side of the plane. When traversing the tree, you need two searches per node to find the entries belonging to this node and to find the split point between the left and the right side.
Having the "level" in the least significant bits works as a tiebreaker in the sorting to ensure that entries closer to the root of the tree are in the beginning of the array.
I wrote about this solution vaguely the last time this link came up (wasn't that long ago either, I think?) and you've filled in all the details and the essential Karras paper link; great post :)
I give up, tell me the AABB trick please
I have shared the trick in this thread above.