BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN
VERSION:1.0
BEGIN:VEVENT
DTSTART:20101118T213000Z
DTEND:20101118T220000Z
LOCATION:393
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Graphics Processing Units (GPUs) have recently emerged as a new platform for high performance, general-purpose computing, due to their combination of high peak performance and high memory bandwidth. Because current GPUs employ deep multithreading to hide latency, they only have small, per-core caches to capture reuse and eliminate unnecessary off-chip accesses. We show that for general-purpose workloads, the ability to copy cache lines between private caches captures inter-core temporal locality and provides substantial reductions in off-chip bandwidth requirements.  We introduce the  sharing tracker to track cache lines in the private caches on a chip imprecisely (because it is only a performance hint).  This is so effective at capturing inter-core reuse that the L2 can be eliminated entirely.  The sharing tracker is motivated by but not specific to the GPU and hence could be used in other manycore organizations.
SUMMARY:The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
PRIORITY:3
END:VEVENT
END:VCALENDAR
