SC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis

SCHEDULE: NOV 13-19, 2010

Diagnosis, Tuning and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

SESSION: Intra-Node Method Optimization


TIME: 10:30AM - 11:00AM

SESSION CHAIR: William Harrod

AUTHOR(S):Aparna Chandramowlishwaran, Kamesh Madduri, Richard Vuduc


Given a program and a multisocket, multicore system, what is the process by which one understands and improves its performance and scalability? We describe an approach in the context of improving within-node scalability of the fast multipole method (FMM). Our process consists of a systematic sequence of modeling, analysis, and tuning steps, beginning with simple models, and gradually increasing their complexity in the quest for deeper performance understanding and better scalability. For the FMM, we significantly improve within-node scalability; for example, on a quad-socket Intel Nehalem-EX system, we show speedups of 1.7x over the previous best multithreaded implementation, 19.3x over a sequential but highly tuned (e.g., SIMD-vectorized) code, and match or outperform a state-of-the-art GPGPU implementation. Our study sheds new light on the form of a more general performance analysis and tuning process that other multicore/manycore tuning practitioners (end-user programmers) and automated performance analysis and tuning tools could themselves apply.

Chair/Author Details:

William Harrod (Chair) - DARPA

Aparna Chandramowlishwaran - Georgia Institute of Technology

Kamesh Madduri - Lawrence Berkeley National Laboratory

Richard Vuduc - Georgia Institute of Technology

