Advanced search
2 files | 860.53 KB

Performance analysis of a parallel, multi-node pipeline for DNA sequencing

Author
Organization
Abstract
Post-sequencing DNA analysis typically consists of read mapping followed by variant calling and is very time-consuming, even on a multi-core machine. Recently, we proposed Halvade, a parallel, multi-node implementation of a DNA sequencing pipeline according to the GATK Best Practices recommendations. The MapReduce programming model is used to distribute the workload among different workers. In this paper, we study the impact of different hardware configurations on the performance of Halvade. Benchmarks indicate that especially the lack of good multithreading capabilities in the existing tools (BWA, SAMtools, Picard, GATK) cause suboptimal scaling behavior. We demonstrate that it is possible to circumvent this bottleneck by using multiprocessing on high-memory machines rather than using multithreading. Using a 15-node cluster with 360 CPU cores in total, this results in a runtime of 1 h 31 min. Compared to a single-threaded runtime of similar to 12 days, this corresponds to an overall parallel efficiency of 53%.
Keywords
IBCN

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 532.64 KB
  • 6889 i.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 327.89 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Decap, Dries, Joke Reumers, Charlotte Herzeel, Pascal Costanza, and Jan Fostier. 2016. “Performance Analysis of a Parallel, Multi-node Pipeline for DNA Sequencing.” In PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT II, 9574:233–242. Sep 06-09, 2015.
APA
Decap, D., Reumers, J., Herzeel, C., Costanza, P., & Fostier, J. (2016). Performance analysis of a parallel, multi-node pipeline for DNA sequencing. PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT II (Vol. 9574, pp. 233–242). Presented at the 11th International Conference on Parallel Processing and Applied Mathematics (PPAM), Sep 06-09, 2015.
Vancouver
1.
Decap D, Reumers J, Herzeel C, Costanza P, Fostier J. Performance analysis of a parallel, multi-node pipeline for DNA sequencing. PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT II. Sep 06-09, 2015; 2016. p. 233–42.
MLA
Decap, Dries, Joke Reumers, Charlotte Herzeel, et al. “Performance Analysis of a Parallel, Multi-node Pipeline for DNA Sequencing.” PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT II. Vol. 9574. Sep 06-09, 2015, 2016. 233–242. Print.
@inproceedings{8521297,
  abstract     = {Post-sequencing DNA analysis typically consists of read mapping followed by variant calling and is very time-consuming, even on a multi-core machine. Recently, we proposed Halvade, a parallel, multi-node implementation of a DNA sequencing pipeline according to the GATK Best Practices recommendations. The MapReduce programming model is used to distribute the workload among different workers. In this paper, we study the impact of different hardware configurations on the performance of Halvade. Benchmarks indicate that especially the lack of good multithreading capabilities in the existing tools (BWA, SAMtools, Picard, GATK) cause suboptimal scaling behavior. We demonstrate that it is possible to circumvent this bottleneck by using multiprocessing on high-memory machines rather than using multithreading. Using a 15-node cluster with 360 CPU cores in total, this results in a runtime of 1 h 31 min. Compared to a single-threaded runtime of similar to 12 days, this corresponds to an overall parallel efficiency of 53\%.},
  author       = {Decap, Dries and Reumers, Joke and Herzeel, Charlotte and Costanza, Pascal and Fostier, Jan},
  booktitle    = {PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT II},
  isbn         = {978-3-319-32152-3},
  issn         = {0302-9743},
  language     = {eng},
  location     = {Krakow, POLAND},
  pages        = {233--242},
  title        = {Performance analysis of a parallel, multi-node pipeline for DNA sequencing},
  url          = {http://dx.doi.org/10.1007/978-3-319-32152-3\_22},
  volume       = {9574},
  year         = {2016},
}

Altmetric
View in Altmetric
Web of Science
Times cited: