0726, 15:30–16:00 (US/Eastern), 32124
AdaptiveHierarchicalRegularBinning.jl
computes a hierarchical spacepartitioning tree for a given set of points of arbitrary dimensions, that divides the space and stores the reordered points offering efficient access. Spacepartitioning data structures are vital for algorithms that exploit spatial distance to reduce computational complexity, see for example the Fast Multipole Method and Nearest Neighbors.
Model
Assuming a set of n
points V
in a d
dimensional space, we partition the space into regular hierarchical bins. The resulting data structure is a sparse tree T
with, at most, 2^d
child nodes per node and a maximum depth of L
.
Normalization
The given set of points V
is mapped into a d
dimensional unit hypercube via an affine transformation that scales and translates V
resulting in Vn
.
Binning
We split each dimension of the unit hypercube in half and recursively continue splitting each resulting hypercube in the same manner. Each partition is called a bin. The subdivision stops at a maximum depth L
or when a bin contains k
or fewer points. The recursive splitting process is recorded as a hierarchical tree data structure.
1D Encoding
We use Morton encoding to map the d
dimensional set Vn
to a onedimensional spacefilling curve. Each point in Vn
is assigned an index in the reduced space resulting in R
. Elements of R
are bitfields, each one consisting of L
groups of d
bits. These groups describe the position of the point in the corresponding level of the tree T
. Points described by morton indices with equal most signifficant digits belong in the same bin. Thus, sorting R
results in Rs
which defines the sparse tree.
Implementation
Cachelocality and parallel programming are some of the techniques we used, to make our implementation performant.
Cache locality
Cache locality offers fast memory access that greatly improves the performance of our algorithm.

The set of points
V
, and by extensionVn
, is defined as aMatrix
of size(d, n)
. The leading dimension describes the number of dimensions in the hyperspace resulting in an access pattern that is more friendly to the cache since all points ofV
have their corresponding coordinates densely packed in memory. 
Sorting
R
andVn
offers a memory layout that is cachefriendly. Since we accessV
through theT
tree, we only access points that are in the same bin.Rs
describes a bin ofT
with a contiguous block of memory, thus preserving cachelocality. 
T
is not a linkedtree.T
is a tree stored densely in memory as aVector{Node}
. EachNode
ofT
is aware of their children and parent using their indices in this denseVector{Node}
.
Parallel Partial Sorting
The Morton curve bitfield denotes the tree node of each point. The partial sorting using the Most Significant Digit (MSD) radixsort, places the points to the corresponding bins. Points that fall within the same leaf node do not get sorted. The radixsort runs in parallel: the partition of digits is done with a parallel countsort, and then each digit subset is processed independently in parallel.
Adaptive Tree
Empty bins, that is, nodes that do not contain any points, are not stored or referenced explicitly.
References

Sun, X., & Pitsianis, N. P. (2001). A Matrix Version of the Fast Multipole Method. In SIAM Review (Vol. 43, Issue 2, pp. 289–300). Society for Industrial & Applied Mathematics (SIAM). https://doi.org/10.1137/s0036144500370835

Curtin, R., March, W., Ram, P., Anderson, D., Gray, A. & Isbell, C.. (2013). TreeIndependent DualTree Algorithms. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):14351443 Available from https://proceedings.mlr.press/v28/curtin13.html.

Carrier, J., Greengard, L., & Rokhlin, V. (1988). A Fast Adaptive Multipole Algorithm for Particle Simulations. In SIAM Journal on Scientific and Statistical Computing (Vol. 9, Issue 4, pp. 669–686). Society for Industrial & Applied Mathematics (SIAM). https://doi.org/10.1137/0909044

Yokota, R. (2013). An FMM Based on Dual Tree Traversal for ManyCore Architectures. In Journal of Algorithms & Computational Technology (Vol. 7, Issue 3, pp. 301–324). SAGE Publications. https://doi.org/10.1260/17483018.7.3.301

Curtin, R.R. (2015). Faster DualTree Traversal for Nearest Neighbor Search. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds) Similarity Search and Applications. SISAP 2015. Lecture Notes in Computer Science(), vol 9371. Springer, Cham. https://doi.org/10.1007/9783319250878_7

Greengard, L. (1990). The numerical solution of the n‐body problem. Computers in physics, 4(2), 142152.

Cho, M., Brand, D., Bordawekar, R., Finkler, U., Kulandaisamy, V., & Puri, R. (2015). PARADIS: An efficient parallel algorithm for inplace radix sort. Proceedings of the VLDB Endowment, 8(12), 15181529.

Zagha, M., & Blelloch, G. E. (1991, August). Radix sort for vector multiprocessors. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing (pp. 712721).

Morton, G. M. (1966). A computer oriented geodetic data base and a new technique in file sequencing.