Advanced search
1 file | 951.40 KB

Halvade-RNA : parallel variant calling from transcriptomic data using MapReduce

(2017) PLOS ONE. 12(3).
Author
Organization
Abstract
Given the current cost-effectiveness of next-generation sequencing, the amount of DNAseq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires similar to 28h, Halvade-RNA reduces this runtime to similar to 2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
Keywords
IBCN

Downloads

  • 6888.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 951.40 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Decap, Dries, Joke Reumers, Charlotte Herzeel, Pascal Costanza, and Jan Fostier. 2017. “Halvade-RNA : Parallel Variant Calling from Transcriptomic Data Using MapReduce.” Plos One 12 (3).
APA
Decap, D., Reumers, J., Herzeel, C., Costanza, P., & Fostier, J. (2017). Halvade-RNA : parallel variant calling from transcriptomic data using MapReduce. PLOS ONE, 12(3).
Vancouver
1.
Decap D, Reumers J, Herzeel C, Costanza P, Fostier J. Halvade-RNA : parallel variant calling from transcriptomic data using MapReduce. PLOS ONE. 2017;12(3).
MLA
Decap, Dries, Joke Reumers, Charlotte Herzeel, et al. “Halvade-RNA : Parallel Variant Calling from Transcriptomic Data Using MapReduce.” PLOS ONE 12.3 (2017): n. pag. Print.
@article{8521294,
  abstract     = {Given the current cost-effectiveness of next-generation sequencing, the amount of DNAseq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires similar to 28h, Halvade-RNA reduces this runtime to similar to 2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.},
  articleno    = {e0174575},
  author       = {Decap, Dries and Reumers, Joke and Herzeel, Charlotte and Costanza, Pascal and Fostier, Jan},
  issn         = {1932-6203},
  journal      = {PLOS ONE},
  language     = {eng},
  number       = {3},
  title        = {Halvade-RNA : parallel variant calling from transcriptomic data using MapReduce},
  url          = {http://dx.doi.org/10.1371/journal.pone.0174575},
  volume       = {12},
  year         = {2017},
}

Altmetric
View in Altmetric
Web of Science
Times cited: