Advanced search
1 file | 319.81 KB Add to list

Halvade: scalable sequence analysis with MapReduce

(2015) BIOINFORMATICS. 31(15). p.2482-2488
Author
Organization
Project
Abstract
Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50x coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading. Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
Keywords
ALIGNMENT, FRAMEWORK, IBCN, GENOME ANALYSIS, CLOUD

Downloads

  • 6314.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 319.81 KB

Citation

Please use this url to cite or link to this publication:

MLA
Decap, Dries, et al. “Halvade: Scalable Sequence Analysis with MapReduce.” BIOINFORMATICS, vol. 31, no. 15, 2015, pp. 2482–88, doi:10.1093/bioinformatics/btv179.
APA
Decap, D., Reumers, J., Herzeel, C., Costanza, P., & Fostier, J. (2015). Halvade: scalable sequence analysis with MapReduce. BIOINFORMATICS, 31(15), 2482–2488. https://doi.org/10.1093/bioinformatics/btv179
Chicago author-date
Decap, Dries, Joke Reumers, Charlotte Herzeel, Pascal Costanza, and Jan Fostier. 2015. “Halvade: Scalable Sequence Analysis with MapReduce.” BIOINFORMATICS 31 (15): 2482–88. https://doi.org/10.1093/bioinformatics/btv179.
Chicago author-date (all authors)
Decap, Dries, Joke Reumers, Charlotte Herzeel, Pascal Costanza, and Jan Fostier. 2015. “Halvade: Scalable Sequence Analysis with MapReduce.” BIOINFORMATICS 31 (15): 2482–2488. doi:10.1093/bioinformatics/btv179.
Vancouver
1.
Decap D, Reumers J, Herzeel C, Costanza P, Fostier J. Halvade: scalable sequence analysis with MapReduce. BIOINFORMATICS. 2015;31(15):2482–8.
IEEE
[1]
D. Decap, J. Reumers, C. Herzeel, P. Costanza, and J. Fostier, “Halvade: scalable sequence analysis with MapReduce,” BIOINFORMATICS, vol. 31, no. 15, pp. 2482–2488, 2015.
@article{6990468,
  abstract     = {{Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine.
 
Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50x coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading.
 
Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.}},
  author       = {{Decap, Dries and Reumers, Joke and Herzeel, Charlotte and Costanza, Pascal and Fostier, Jan}},
  issn         = {{1367-4803}},
  journal      = {{BIOINFORMATICS}},
  keywords     = {{ALIGNMENT,FRAMEWORK,IBCN,GENOME ANALYSIS,CLOUD}},
  language     = {{eng}},
  number       = {{15}},
  pages        = {{2482--2488}},
  title        = {{Halvade: scalable sequence analysis with MapReduce}},
  url          = {{http://dx.doi.org/10.1093/bioinformatics/btv179}},
  volume       = {{31}},
  year         = {{2015}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: