ABSTRACT: The MapReduce framework has become a popular and powerful tool to process large datasets in parallel over a cluster of computing nodes. Currently, there are many flavors of implementations of MapReduce, among which the most popular is the Hadoop implementation in Java. However, these implementations either rely on third-party file systems for across-computer-node communication or are difficult to implement with socket programming or communication libraries such as MPI. To address these challenges, we investigated utilizing the X10 language to implement MapReduce and tested it with the word-count use case. The key performance factor in implementing MapReduce is data moving across different computer nodes. We find that X10’s partitioned global address space
(PGAS) allows for high programming productivity in dealing with across-node
communication; such as distributed arrays, thus, a major challenge with MapReduce implementations is easily solved. We tested two main implementations: the first utilizes the HashMap data structure and the second a Rail with elements consisting of a string and integer pair. The performance of these two implementations are analyzed and discussed.
I shall present my work with a poster that demonstrates the design, implementation and testing results of this research. The design and implementation aspects shall be presented with diagrams depicting the two main implementations, as well an analysis of the benefits of utilizing X10 in this regard. The final result shall be accompanied with graphs and an analysis of the performance, as well as the pros and cons of utilizing X10 to implement a MapReduce word count use case.
Han Dong - University of Maryland Baltimore County