SC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis

SCHEDULE: NOV 13-19, 2010

Data Sharing Options for Scientific Workflows on Amazon EC2

SESSION: HPC on Clouds


TIME: 2:00PM - 2:30PM

SESSION CHAIR: David Abramson

AUTHOR(S):Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Benjamin P. Berman, Bruce Berriman, Phil Maechling


Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon’s EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.

Chair/Author Details:

David Abramson (Chair) - Monash University

Gideon Juve - ISI

Ewa Deelman - ISI

Karan Vahi - ISI

Gaurang Mehta - ISI

Benjamin P. Berman - University of Southern California

Bruce Berriman - California Institute of Technology

Phil Maechling - Southern California Earthquake Center

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

The full paper can be found in the ACM Digital Library and IEEE Computer Society

   Sponsors    IEEE    ACM