Advanced search
1 file | 941.20 KB

VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering

(2015) BIOINFORMATICS. 31(1). p.94-101
Author
Organization
Project
Bioinformatics: from nucleotids to networks (N2N)
Project
Bioinformatics: from nucleotids to networks (N2N)
Abstract
Motivation: In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. Results: A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%.

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 941.20 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Verbist, Bie, Kim Thys, Joke Reumers, Yves Wetzels, Koen Van der Borght, Willem Talloen, Jeroen Aerssens, Lieven Clement, and Olivier Thas. 2015. “VirVarSeq: a Low-frequency Virus Variant Detection Pipeline for Illumina Sequencing Using Adaptive Base-calling Accuracy Filtering.” Bioinformatics 31 (1): 94–101.
APA
Verbist, Bie, Thys, K., Reumers, J., Wetzels, Y., Van der Borght, K., Talloen, W., Aerssens, J., et al. (2015). VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering. BIOINFORMATICS, 31(1), 94–101.
Vancouver
1.
Verbist B, Thys K, Reumers J, Wetzels Y, Van der Borght K, Talloen W, et al. VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering. BIOINFORMATICS. 2015;31(1):94–101.
MLA
Verbist, Bie, Kim Thys, Joke Reumers, et al. “VirVarSeq: a Low-frequency Virus Variant Detection Pipeline for Illumina Sequencing Using Adaptive Base-calling Accuracy Filtering.” BIOINFORMATICS 31.1 (2015): 94–101. Print.
@article{7180112,
  abstract     = {Motivation: In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. 
Results: A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5\%.},
  author       = {Verbist, Bie and Thys, Kim and Reumers, Joke and Wetzels, Yves and Van der Borght, Koen and Talloen, Willem and Aerssens, Jeroen and Clement, Lieven and Thas, Olivier},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  language     = {eng},
  number       = {1},
  pages        = {94--101},
  title        = {VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btu587},
  volume       = {31},
  year         = {2015},
}

Altmetric
View in Altmetric
Web of Science
Times cited: