Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses
SESSION: Optimization Strategies on the Node
EVENT TYPE: Paper
TIME: 4:00PM - 4:30PM
SESSION CHAIR: Karl Fuerlinger
AUTHOR(S):Andreas Sandberg, David Eklöv, Erik Hagersten
ABSTRACT: Contention for shared cache resources has been recognized as a major bottleneck for multicores—especially for mixed workloads of independent applications. While most modern processors implement instructions to manage caches, these instructions are largely unused due to a lack of understanding of how to best leverage them.
This paper introduces a classification of applications into four cache usage categories. We discuss how applications from different categories affect each other's performance indirectly through cache sharing and devise a scheme to optimize such sharing. We also propose a low-overhead method to automatically find the best per-instruction cache management policy.
We demonstrate how the indirect cache-sharing effects of mixed workloads can be tamed by automatically altering some instructions to better manage cache resources. Practical experiments demonstrate that our software-only method can improve application performance up to 35% on x86 multicore hardware.
Karl Fuerlinger (Chair) - University of California, Berkeley