BEGIN:VCALENDAR
VERSION:2.0
X-WR-TIMEZONE:America/Chicago
PRODID:-//Apple Inc.//iCal 3.0//EN
CALSCALE:GREGORIAN
X-WR-CALNAME:The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
DTSTART:20070311T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
TZNAME:CDT
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
DTSTART:20071104T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
TZNAME:CST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
SEQUENCE:2
DTSTART;TZID=America/Chicago:20101118T153000
DESCRIPTION:ABSTRACT: Graphics Processing Units (GPUs) have recently emerged as a new platform for high performance\, general-purpose computing\, due to their combination of high peak performance and high memory bandwidth. Because current GPUs employ deep multithreading to hide latency\, they only have small\, per-core caches to capture reuse and eliminate unnecessary off-chip accesses. We show that for general-purpose workloads\, the ability to copy cache lines between private caches captures inter-core temporal locality and provides substantial reductions in off-chip bandwidth requirements.  We introduce the  sharing tracker to track cache lines in the private caches on a chip imprecisely (because it is only a performance hint).  This is so effective at capturing inter-core reuse that the L2 can be eliminated entirely.  The sharing tracker is motivated by but not specific to the GPU and hence could be used in other manycore organizations.
UID:pap280@sc10.supercomputing.org
SUMMARY:The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
DTEND;TZID=America/Chicago:20101118T160000
LOCATION:393
END:VEVENT
END:VCALENDAR
