SC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis

SCHEDULE: NOV 13-19, 2010

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs

SESSION: Large-Scale Stencil Computations


TIME: 2:00PM - 2:30PM

SESSION CHAIR: Kengo Nakajima

AUTHOR(S):Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, Pradeep Dubey


Stencil computation sweeps over a spatial grid over multiple time steps to perform nearest neighbor computations. The bandwidth-compute requirement for a large class of stencil kernels is very high, and their performance is bound by the available memory bandwidth. Since memory bandwidth grows slower than compute, the performance of stencil kernels will not scale with increasing compute density. We present a novel 3.5D-blocking algorithm that performs a 2.5D-spatial and a 1D-temporal blocking of the input grid into on-chip memory for both CPUs and GPUs. The resultant algorithm is amenable to both thread-level and data-level parallelism, and scales near-linearly with the SIMD width and multiple-cores. We are faster or comparable to state-of-the-art-stencil implementations on CPUs and GPUs. For the case of 7-point-stencil, we are 1.5X-faster on CPUs, and 1.8X faster on GPUs for single-precision floating point inputs than previously reported numbers. For Lattice Boltzmann methods, we are 2.1X faster on CPUs.

Chair/Author Details:

Kengo Nakajima (Chair) - University of Tokyo

Anthony Nguyen - Intel Corporation

Nadathur Satish - Intel Corporation

Jatin Chhugani - Intel Corporation

Changkyu Kim - Intel Corporation

Pradeep Dubey - Intel Corporation

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

The full paper can be found in the ACM Digital Library and IEEE Computer Society

   Sponsors    IEEE    ACM