Advanced search
2 files | 8.82 MB

Jabba: hybrid error correction for long sequencing reads using maximal exact matches

Giles Miclotte (UGent) , Mahdi Heydari (UGent) , Piet Demeester (UGent) , Pieter Audenaert (UGent) and Jan Fostier (UGent)
Author
Organization
Project
01MR0410
Abstract
Third generation sequencing platforms produce longer reads with higher error rates than second generation sequencing technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is that this mapping is constructed with a seed and extend methodology, using maximal exact matches as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of maximal exact matches in the context of third generation reads are presented.
Keywords
CONSENSUS, IBCN, GENOMES, ACCURATE, ALGORITHMS, Sequence analysis, Error correction, Maximal exact matches, SIMULATOR, ALIGNMENT, de Bruijn graph, EXTREME VALUE THEORY

Downloads

  • 6310 i.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 284.76 KB
  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 8.53 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Miclotte, Giles, Mahdi Heydari, Piet Demeester, Pieter Audenaert, and Jan Fostier. 2015. “Jabba: Hybrid Error Correction for Long Sequencing Reads Using Maximal Exact Matches.” In Lecture Notes in Bioinformatics, 9289:175–188. Springer.
APA
Miclotte, G., Heydari, M., Demeester, P., Audenaert, P., & Fostier, J. (2015). Jabba: hybrid error correction for long sequencing reads using maximal exact matches. Lecture Notes in Bioinformatics (Vol. 9289, pp. 175–188). Presented at the 15e International Workshop, also published in Lecture Notes in Bioinformatics (WABI 2015), Springer.
Vancouver
1.
Miclotte G, Heydari M, Demeester P, Audenaert P, Fostier J. Jabba: hybrid error correction for long sequencing reads using maximal exact matches. Lecture Notes in Bioinformatics. Springer; 2015. p. 175–88.
MLA
Miclotte, Giles et al. “Jabba: Hybrid Error Correction for Long Sequencing Reads Using Maximal Exact Matches.” Lecture Notes in Bioinformatics. Vol. 9289. Springer, 2015. 175–188. Print.
@inproceedings{6990407,
  abstract     = {Third generation sequencing platforms produce longer reads with higher error rates than second generation sequencing technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is that this mapping is constructed with a seed and extend methodology, using maximal exact matches as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of maximal exact matches in the context of third generation reads are presented.},
  author       = {Miclotte, Giles and Heydari, Mahdi and Demeester, Piet and Audenaert, Pieter and Fostier, Jan},
  booktitle    = {Lecture Notes in Bioinformatics},
  isbn         = {9783662482216},
  issn         = {0302-9743},
  keywords     = {CONSENSUS,IBCN,GENOMES,ACCURATE,ALGORITHMS,Sequence analysis,Error correction,Maximal exact matches,SIMULATOR,ALIGNMENT,de Bruijn graph,EXTREME VALUE THEORY},
  language     = {eng},
  location     = {Georgia Technol Inst, Atlanta, GA},
  pages        = {175--188},
  publisher    = {Springer},
  title        = {Jabba: hybrid error correction for long sequencing reads using maximal exact matches},
  volume       = {9289},
  year         = {2015},
}

Web of Science
Times cited: