Advanced search
1 file | 2.04 MB Add to list

Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data

Author
Organization
Abstract
Background: Long non-coding RNAs (lncRNAs) are typically expressed at low levels and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 25 pipelines for testing DE in RNA-seq data is comprehensively evaluated, with a particular focus on lncRNAs and low-abundance mRNAs. Fifteen performance metrics are used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets. Results: Gene expression data are simulated using non-parametric procedures in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, results for mRNA and lncRNA were tracked separately. All the pipelines exhibit inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and benchmark RNA-seq datasets. The substandard performance of DE tools for lncRNAs applies also to low-abundance mRNAs. No single tool uniformly outperformed the others. Variability, number of samples, and fraction of DE genes markedly influenced DE tool performance. Conclusions: Overall, linear modeling with empirical Bayes moderation (limma) and a non-parametric approach (SAMSeq) showed good control of the false discovery rate and reasonable sensitivity. Of note, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in realistic settings such as in clinical cancer research. About half of the methods showed a substantial excess of false discoveries, making these methods unreliable for DE analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, giving guidance on selection of the optimal DE tool (http://statapps.ugent.be/tools/AppDGE/).
Keywords
RNA-seq, mRNA, lncRNA, Differential gene expression, FALSE DISCOVERY RATE, NONPARAMETRIC APPROACH, SEQ EXPERIMENTS, NORMALIZATION

Downloads

  • s13059-018-1466-5.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 2.04 MB

Citation

Please use this url to cite or link to this publication:

MLA
Assefa, Alemu Takele et al. “Differential Gene Expression Analysis Tools Exhibit Substandard Performance for Long Non-coding RNA-sequencing Data.” GENOME BIOLOGY 19 (2018): n. pag. Print.
APA
Assefa, A. T., De Paepe, K., Everaert, C., Mestdagh, P., Thas, O., & Vandesompele, J. (2018). Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data. GENOME BIOLOGY, 19.
Chicago author-date
Assefa, Alemu Takele, Katrijn De Paepe, Celine Everaert, Pieter Mestdagh, Olivier Thas, and Jo Vandesompele. 2018. “Differential Gene Expression Analysis Tools Exhibit Substandard Performance for Long Non-coding RNA-sequencing Data.” Genome Biology 19.
Chicago author-date (all authors)
Assefa, Alemu Takele, Katrijn De Paepe, Celine Everaert, Pieter Mestdagh, Olivier Thas, and Jo Vandesompele. 2018. “Differential Gene Expression Analysis Tools Exhibit Substandard Performance for Long Non-coding RNA-sequencing Data.” Genome Biology 19.
Vancouver
1.
Assefa AT, De Paepe K, Everaert C, Mestdagh P, Thas O, Vandesompele J. Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data. GENOME BIOLOGY. 2018;19.
IEEE
[1]
A. T. Assefa, K. De Paepe, C. Everaert, P. Mestdagh, O. Thas, and J. Vandesompele, “Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data,” GENOME BIOLOGY, vol. 19, 2018.
@article{8599289,
  abstract     = {Background: Long non-coding RNAs (lncRNAs) are typically expressed at low levels and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 25 pipelines for testing DE in RNA-seq data is comprehensively evaluated, with a particular focus on lncRNAs and low-abundance mRNAs. Fifteen performance metrics are used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets. 
Results: Gene expression data are simulated using non-parametric procedures in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, results for mRNA and lncRNA were tracked separately. All the pipelines exhibit inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and benchmark RNA-seq datasets. The substandard performance of DE tools for lncRNAs applies also to low-abundance mRNAs. No single tool uniformly outperformed the others. Variability, number of samples, and fraction of DE genes markedly influenced DE tool performance. 
Conclusions: Overall, linear modeling with empirical Bayes moderation (limma) and a non-parametric approach (SAMSeq) showed good control of the false discovery rate and reasonable sensitivity. Of note, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in realistic settings such as in clinical cancer research. About half of the methods showed a substantial excess of false discoveries, making these methods unreliable for DE analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, giving guidance on selection of the optimal DE tool (http://statapps.ugent.be/tools/AppDGE/).},
  articleno    = {96},
  author       = {Assefa, Alemu Takele and De Paepe, Katrijn and Everaert, Celine and Mestdagh, Pieter and Thas, Olivier and Vandesompele, Jo},
  issn         = {1474-760X},
  journal      = {GENOME BIOLOGY},
  keywords     = {RNA-seq,mRNA,lncRNA,Differential gene expression,FALSE DISCOVERY RATE,NONPARAMETRIC APPROACH,SEQ EXPERIMENTS,NORMALIZATION},
  language     = {eng},
  pages        = {16},
  title        = {Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data},
  url          = {http://dx.doi.org/10.1186/s13059-018-1466-5},
  volume       = {19},
  year         = {2018},
}

Altmetric
View in Altmetric
Web of Science
Times cited: