SC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis

SCHEDULE: NOV 13-19, 2010

Scaling Highly-Parallel Data-Intensive Supercomputing Applications on a Parallel Clustered Filesystem

SESSION: Storage Challenge Presentations

EVENT TYPE: Storage Challenge

TIME: 10:30AM - 11:00AM

SESSION CHAIR: Alan Sussman

ROOM:388

ABSTRACT:
A new class of data-intensive supercomputing applications nvolves processing massive amounts of data with a greater focus on semantically transforming the data. This class of applications is embarrassingly parallel and well suited for the MapReduce programming framework that allows users to do large-scale data analysis where the runtime handles the system architecture, data partitioning and task scheduling. In this paper, we demonstrate a business intelligence application running GPFS over a cluster of commodity machines and direct-attached storage. The architecture maximizes storage performance by using five innovative optimizations: (a) Locality awareness to allow compute jobs to be scheduled close to data; (b) Metablocks that allow large and small block sizes to co-exist in the same file system; (c) Write affinity that allows applications to dictate the layout of files on different nodes; (d) Pipelined replication to maximize use of network bandwidth; (e) Distributed recovery to minimize the effect of failures.

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

   Sponsors    IEEE    ACM