SC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis

SCHEDULE: NOV 13-19, 2010

BioHDF: Open Binary File Formats for Next-Generation DNA Sequencing Data

SESSION: Research Poster Reception


TIME: 5:15PM - 7:00PM

AUTHOR(S):Dana Robinson, Mike Folk, Mark Welsh, Todd Smith

ROOM:Main Lobby

The huge volume of data produced by the latest generation of sequencing technologies presents significant challenges in data transmission, storage, bioinformatics analysis, visualization, and archiving. Widespread adoption of next-generation DNA sequencing (NGS) will be hindered if bioinformatics software cannot scale to meet these challenges. The BioHDF project ( ) aims to solve some of these data storage and manipulation challenges by using the established, open-source HDF5 ( ) binary file format to store NGS data. BioHDF extends HDF5 data structures and library routines with new features to support the high-performance data storage and computation requirements of next-generation sequencing. The open-source, BSD-licensed tools support the storage of sequences, their alignments against reference data sources and annotations such as SNP or splice variation analysis. Multiple NGS platforms are supported including Applied Biosystems SOLiD, Illumina Genome Analyzer, Roche 454, and Helicos.

Chair/Author Details:

Dana Robinson - HDF Group

Mike Folk - HDF Group

Mark Welsh - Geospiza

Todd Smith - Geospiza

