SC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis

SCHEDULE: NOV 13-19, 2010

An Evaluation of Fault-Tolerance Techniques for Exascale Systems

SESSION: Research Poster Reception

EVENT TYPE: Poster

TIME: 5:15PM - 7:00PM

AUTHOR(S):Kurt B. Ferreira, Rolf Riesen

ROOM:Main Lobby

ABSTRACT:
As High-End Computing machines continue to grow, issues such as fault tolerance and reliability limit application scalability. In this work we evaluate the suitability of three well-known techniques that allow applications to ensure progress across a wide variety of faults; coordinated checkpointing, node-level replication, and uncoordinated checkpointing with sender-side message logging. For each method we outline the techniques limitations and overheads as well as pointing out scenarios when the method would be suitable for exascale systems.

Chair/Author Details:

Kurt B. Ferreira - Sandia National Laboratories

Rolf Riesen - Sandia National Laboratories

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

   Sponsors    IEEE    ACM