Scale and Concurrency of GIGA+: File System Directories with Millions of Files
SESSION: Doctoral Research Showcase II (Workflows and Parallel and Distributed IO Optimization)
EVENT TYPE: Doctoral Research Showcase
TIME: 2:24PM - 2:42PM
SESSION CHAIR: Sadaf R. Alam
ABSTRACT: Over the last decade file system evolution has favored scaling for large files instead of scaling for large number of files. And two forces are calling to change this: new workloads that generate large number of small I/O accesses at high speeds and the dramatically increasing application-level parallelism. Data-intensive HPC applications have a growing need for POSIX-like file systems for trillions of files and directories with billions of files -- and most scalable file systems are ill-equipped to meet these new requirements.
This thesis proposes to understand the tradeoffs in scaling traditional file system directories to store billions of files and sustain hundreds of thousands of concurrent mutations per second.
We will also explore the challenges in using such large mutating directories with the existing programming API and production file system implementations. Finally, we explore how scalable directories can provide abstractions to simplify the development of data management systems.
Sadaf R. Alam (Chair) - Swiss National Supercomputing Centre