AUTHOR(S):Rahul S. Sampath, Hari Sundar, Shravan K. Veerapaneni
ABSTRACT: We present fast adaptive parallel algorithms to compute the sum of N Gaussians at N points. Direct sequential computation of this sum would take O(N2) time.
The parallel time complexity estimates for our algorithms are O(N/np) for uniform point distributions and
O( (N/np) log (N/np) + nplog np )
for non-uniform distributions using np CPUs. We incorporate a plane-wave representation of the Gaussian kernel which permits "diagonal translation". We use parallel octrees and a new scheme for translating the plane-waves to efficiently handle non-uniform distributions. Computing the transform to six-digit accuracy at 120 billion points took approximately 140 seconds using 4096 cores on the Jaguar supercomputer.
Our implementation is "kernel-independent" and can handle other "Gaussian-type" kernels even when explicit analytic expression for the kernel is not known. These algorithms form a new class of core computational machinery for solving parabolic PDEs on massively parallel architectures.
Osni Marques (Chair) - Lawrence Berkeley National Laboratory