The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
SESSION: Optimization Strategies on the Node
EVENT TYPE: Paper
TIME: 3:30PM - 4:00PM
SESSION CHAIR: Karl Fuerlinger
AUTHOR(S):David Tarjan, Kevin Skadron
ABSTRACT: Graphics Processing Units (GPUs) have recently emerged as a new platform for high performance, general-purpose computing, due to their combination of high peak performance and high memory bandwidth. Because current GPUs employ deep multithreading to hide latency, they only have small, per-core caches to capture reuse and eliminate unnecessary off-chip accesses. We show that for general-purpose workloads, the ability to copy cache lines between private caches captures inter-core temporal locality and provides substantial reductions in off-chip bandwidth requirements. We introduce the sharing tracker to track cache lines in the private caches on a chip imprecisely (because it is only a performance hint). This is so effective at capturing inter-core reuse that the L2 can be eliminated entirely. The sharing tracker is motivated by but not specific to the GPU and hence could be used in other manycore organizations.
Karl Fuerlinger (Chair) - University of California, Berkeley