Parallelized Hartree-Fock Code for Scalable Structural and Electronic Simulation of Large Nanoscale Molecules
SESSION: Student Poster Reception
EVENT TYPE: Poster, ACM Student Poster
TIME: 5:15PM - 7:00PM
AUTHOR(S):David C. Goode
ABSTRACT: The Hartree-Fock approximation to Schrodinger’s equation for multi-electron Born-Oppenheimer molecular Hamiltonians is a well established method for computing the quantum mechanical, non-relativistic energy levels, electronic properties, and nuclear structures of molecules. I worked on a new code implementing this approximation with the intent to achieve better parallel performance on petaflop supercomputers and beyond, allowing for study of larger molecules on the order of thousands of atoms each at reasonable accuracy. Current standard methods have difficulty scaling to this size. I parallelized the algorithm using the standard MPI library for interprocessor communication as well as the ScalaPack parallelized library for dense matrix linear algebra operations. I also implemented functionalities to increase the method’s accuracy in line with that produced by standard implementations of Hartree-Fock such as NWChem. This included adding open-shell orbital support as well as support for non-symmetric orbitals above s to support larger atoms. Performance data will be collected on varied large molecules by measuring the code’s performance on New York Blue’s BlueGene/L supercomputer for varying numbers of processors, up to the full machine’s 18,432 nodes. Scaling from 128 to 1024 processors has already been shown to perform admirably on a test molecule. Results for smaller molecules are compared to NWChem to check accuracy. Hopefully, we will achieve performant scaling for very large molecules and be able to run this code in a reasonable period of time and achieve accurate simulation results for previously impractical molecules. The code will also provide for further accuracy improvements and expansions in the future. If time permits, we may begin simulating the relevant electronic properties of known molecules in order to test the program’s utility for solar collector analysis. The immediate goal for applying this code is to aid research into nanoparticles that could be synthesized for use in solar power collection. We intend to simulate their structure and electronic properties, including charge distributions and conductance, to discern whether they are good candidates for power collection before actually synthesizing these particles in a laboratory. Realistically sized nanoparticles of this type require thousands of atoms and have important large-scale properties, thus the scaling of current solutions was insufficient based on my mentors' research and experience in the field.
The language used is C++ with the standard MPI implementation on BlueGene. It is fairly cross-platform compatible, also working on the OpenMPI implementation distributed with Ubuntu Linux. The program makes heavy use of the LaPACK and ScalaPACK serial and parallelized, respectively, linear algebra libraries, which have implementations for both BlueGene and regular Linux. ScalaPACK utilizes MPI through the BLACS communication library. The Hartree-Fock method employed makes use of contracted Gaussian basis functions for easy analytic integration and reasonable accuracy comparable to Slater Type Orbitals (STO). Differentiation based generating functions are used for higher, non-symmetric orbitals, which consist of spherically symmetric Gaussians multiplied by asymmetric monomials in X, Y, and Z, all in position space.
The program was run for debugging purposes on a Ubuntu Linux virtual machine with the relevant libraries installed from their standard packages. The program can run in both serial and parallel modes, although the algorithms are optimized for parallel runs. The parallelization is achieved through 2D block-cyclic distribution of most O(N^2) matrices involve in eigenvector calculations, such as the coefficients matrix, the Fock matrix, and the density matrix. The program supports open spin configurations through use of simultaneously solving one set of density matrices per spin, as outlined in Modern Quantum Chemistry by Szabo and Ostlund (1996) and other texts. Results for simple artificial molecules such as H_16, H_2, and H2He all agree with NWChem, verifying the program’s parallelized accuracy.
Parallelization performance is achieved through numerous optimizations to the naive O(N^4) algorithm. Two-electron integrals are prescreened based on a minimum value and thrown out if trivial, using a much faster upper-bounding algorithm. They are placed in dynamically sized storage to take advantage of asymptotic O(N^2) density of non-trivial values in larger molecules. This saves over 90% of computation time and storage in a 700 atom test molecule. The electron density matrix is broadcast and reused for energy calculations as well as Fock matrix calculations. The coefficients matrix is shared in a novel checkerboard pattern that provides each processor with the transpose row it needs for calculations while balancing 2 communications per step across every processor. This algorithm was achieved through use of symmetric block sizes that make transpose locations uniform across every block stored cyclicly on a given node.
I’ve attached a preliminary poster that duplicates much of this text; the final one will contain quantitative data on scalability and accuracy as well as illustrations for things like the communication topology used, graphs of results, and diagrams of optimized molecules.