SC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis

SCHEDULE: NOV 13-19, 2010

Coordinated Infrastructure for Fault Tolerant Systems (CIFTS)

SESSION: Coordinated Infrastructure for Fault Tolerant Systems (CIFTS)

EVENT TYPE: Birds of a Feather

TIME: 12:15PM - 1:15PM

SESSION LEADER(S):Pete Beckman, Rinku Gupta, Al Geist


The Coordinated Infrastructure for Fault Tolerant Systems (CIFTS) initiative provides a standard framework, through the Fault Tolerance Backplane (FTB), where any component of the system, whether hardware or software, can report or be notified of faults through a common interface - thus enabling coordinated fault tolerance and recovery. SC'07, SC'08 and SC'09 saw an enthusiastic audience of industry leaders, academia, and research participate in the CIFTS BOF. Expanding on our previous success, the objectives of the SC'10 BOF are: 1. Discuss the experiences gained and challenges faced in comprehensive fault management on petascale leadership machines, and the impact of the CIFTS framework in these environments. Teams working on integrating CIFTS in high-end computing environments such as the Cray XT and IBM Blue Gene machines will share their experiences. Teams developing CIFTS-enabled applications and other popular and widely-used software such as MPICH2, Open MPI, MVAPICH2, BLCR, Math libraries like FT-LA etc. will also share their experiences. 2. Discuss the recent enhancements and planned developments for CIFTS and solicit audience feedback. Discussion will also focus on how system hardware and software, including high-level applications, can adapt to coordinated fault tolerance infrastructures such as CIFTS and how/what services should be provided by such frameworks for emerging extreme-scale architectures. 3. Bring together individuals responsible for exascale computing infrastructures, who have an interest in developing fault tolerance specifically for these environments.

Session Leader Details:

Pete Beckman (Primary Session Leader) - Argonne National Laboratory

Rinku Gupta (Secondary Session Leader) - Argonne National Laboratory

Al Geist (Secondary Session Leader) - Oak Ridge National Laboratory

