ABSTRACT: Many MPI libraries have suffered from software
bugs, which severely impact the productivity of a large number of
users. This paper presents a new method called FlowChecker for
detecting communication-related bugs in MPI libraries. The main
idea is to extract program intentions of message passing (MP-intentions),
and to check whether these MP-intentions are fulfilled
correctly by the underlying MPI libraries, i.e., whether messages
are delivered correctly from specified sources to specified destinations.
If not, FlowChecker reports the bugs and provides
diagnostic information.
We have built a FlowChecker prototype on Linux and evaluated
it with five real-world bug cases in three widely-used
MPI libraries, including Open MPI, MPICH2, and MVAPICH2.
Our experimental results show that FlowChecker effectively
detects all five evaluated bug cases and provides useful diagnostic
information. Additionally, our experiments with HPL and NPB
show that FlowChecker incurs low runtime overhead (0.9-9.7%
on three MPI libraries).
Chair/Author Details:
Martin Schulz (Chair) - Lawrence Livermore National Laboratory