Coordinated Infrastructure for Fault Tolerant Systems (CIFTS)
SESSION: Coordinated Infrastructure for Fault Tolerant Systems (CIFTS)
EVENT TYPE: Birds of a Feather
TIME: 12:15PM - 1:15PM
SESSION LEADER(S):Pete Beckman, Rinku Gupta, Al Geist
ABSTRACT: The Coordinated Infrastructure for Fault Tolerant Systems (CIFTS)
initiative provides a standard framework, through the Fault Tolerance
Backplane (FTB), where any component of the system, whether hardware or
software, can report or be notified of faults through a common interface
- thus enabling coordinated fault tolerance and recovery. SC'07, SC'08
and SC'09 saw an enthusiastic audience of industry leaders, academia,
and research participate in the CIFTS BOF.
Expanding on our previous success, the objectives of the SC'10 BOF are:
1. Discuss the experiences gained and challenges faced in comprehensive fault
management on petascale leadership machines, and the impact of the CIFTS
framework in these environments. Teams working on integrating CIFTS in
high-end computing environments such as the Cray XT and IBM Blue Gene machines
will share their experiences. Teams developing CIFTS-enabled applications and other popular and widely-used software such as MPICH2, Open MPI,
MVAPICH2, BLCR, Math libraries like FT-LA etc. will also share their experiences.
2. Discuss the recent enhancements and planned developments for
CIFTS and solicit audience feedback. Discussion will also focus on
how system hardware and software, including high-level applications, can adapt to coordinated fault tolerance infrastructures such as CIFTS
and how/what services should be provided by such frameworks for
emerging extreme-scale architectures.
3. Bring together individuals responsible for exascale computing
infrastructures, who have an interest in developing fault tolerance
specifically for these environments.
Session Leader Details:
Pete Beckman (Primary Session Leader) - Argonne National Laboratory
Rinku Gupta (Secondary Session Leader) - Argonne National Laboratory
Al Geist (Secondary Session Leader) - Oak Ridge National Laboratory