Scheduling Data Intensive Applications Based on Multi-Source Parallel Data Retrievals
SESSION: Doctoral Research Showcase II (Workflows and Parallel and Distributed IO Optimization)
EVENT TYPE: Doctoral Research Showcase
TIME: 2:06PM - 2:24PM
SESSION CHAIR: Sadaf R. Alam
ABSTRACT: Scientific experiments carried out in collaboration by researchers results in replication of large amounts of data globally. Unlike traditional approaches, where one best data source would be selected for retrieving data, we could leverage the presence of these distributed data sources when scheduling application workflows in distributed systems. Moreover, when the scheduling system has cost and time constraints, the problem of selection of data sources, distributed compute resources, tasks and data files to minimize total cost and time, are challenging.
I propose probe-based multi-source parallel-data-retrieval technique to produce online and offline-schedules to minimize both time and cost of execution of application workflows when using distributed resources. I have implemented a workflow management system that uses these heuristics to execute several real-world applications: image registration, intrusion detection, distributed evolutionary algorithms and ECG analysis. From live demonstrations at prestigious conferences, I have demonstrated the impact of my approach to the scientific community.
Sadaf R. Alam (Chair) - Swiss National Supercomputing Centre