BioHDF: Open Binary File Formats for Next-Generation DNA Sequencing Data
SESSION: Research Poster Reception
EVENT TYPE: Poster
TIME: 5:15PM - 7:00PM
AUTHOR(S):Dana Robinson, Mike Folk, Mark Welsh, Todd Smith
ABSTRACT: The huge volume of data produced by the latest generation of sequencing technologies presents significant challenges in data transmission, storage, bioinformatics analysis, visualization, and archiving. Widespread adoption of next-generation DNA sequencing (NGS) will be hindered if bioinformatics software cannot scale to meet these challenges. The BioHDF project (http://www.biohdf.org ) aims to solve some of these data storage and manipulation challenges by using the established, open-source HDF5 (http://www.biohdf.org ) binary file format to store NGS data.
BioHDF extends HDF5 data structures and library routines with new features to support the high-performance data storage and computation requirements of next-generation sequencing. The open-source, BSD-licensed tools support the storage of sequences, their alignments against reference data sources and annotations such as SNP or splice variation analysis. Multiple NGS platforms are supported including Applied Biosystems SOLiD, Illumina Genome Analyzer, Roche 454, and Helicos.