Ingredients for Good Parallel Performance on Multicore-Based Systems
SESSION: M16: Ingredients for Good Parallel Performance on Multi-Core Processors
EVENT TYPE: Tutorial
TIME: 1:30PM - 5:00PM
Presenter(s):Georg Hager, Gerhard Wellein
ABSTRACT: This tutorial covers program optimization techniques for multi-core processors and the systems they are used in. It concentrates on the dominating parallel programming paradigms, MPI and OpenMP.
We start by giving an architectural overview of multicore processors. Peculiarities like shared vs. separate caches, bandwidth bottlenecks, and ccNUMA characteristics are pointed out. We show typical performance features like synchronization overhead, intranode MPI bandwidths and latencies, ccNUMA locality, and bandwidth saturation (in cache and memory) in order to pinpoint the influence of system topology and thread affinity on the performance of typical parallel programming constructs. Multiple ways of probing system topology and establishing affinity, either by explicit coding or separate tools, are demonstrated. Finally we elaborate on programming techniques that help establish optimal parallel memory access patterns and/or cache reuse, with an emphasis on leveraging shared caches for improving performance.
Georg Hager - Erlangen Regional Computing Center
Gerhard Wellein - Erlangen Regional Computing Center