Ghent University Academic Bibliography

Advanced

Jabba: hybrid error correction for long sequencing reads

Giles Miclotte UGent, Mahdi Heydari UGent, Piet Demeester UGent, Stephane Rombauts UGent, Yves Van de Peer UGent, Pieter Audenaert UGent and Jan Fostier UGent (2016) ALGORITHMS FOR MOLECULAR BIOLOGY. 11.
abstract
Background: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. Results: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. Conclusion: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
IBCN, ALGORITHMS, CONSENSUS, ASSEMBLIES, DE-BRUIJN GRAPHS, ACCURATE, GENOME, EXTREME VALUE THEORY, Maximal exact matches, de Bruijn graph, Error correction, Sequence analysis, ALIGNMENT, SIMULATOR
journal title
ALGORITHMS FOR MOLECULAR BIOLOGY
Algorithms. Mol. Biol.
volume
11
article number
10
pages
12 pages
Web of Science type
Article
Web of Science id
000375467200001
JCR category
MATHEMATICAL & COMPUTATIONAL BIOLOGY
JCR impact factor
1.786 (2016)
JCR rank
22/57 (2016)
JCR quartile
2 (2016)
ISSN
1748-7188
DOI
10.1186/s13015-016-0075-7
project
Bioinformatics: from nucleotids to networks (N2N)
language
English
UGent publication?
yes
classification
A1
copyright statement
I have retained and own the full copyright for this publication
id
7244170
handle
http://hdl.handle.net/1854/LU-7244170
date created
2016-06-06 12:19:37
date last changed
2017-02-07 09:09:11
@article{7244170,
  abstract     = {Background: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. 
Results: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. 
Conclusion: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.},
  articleno    = {10},
  author       = {Miclotte, Giles and Heydari, Mahdi and Demeester, Piet and Rombauts, Stephane and Van de Peer, Yves and Audenaert, Pieter and Fostier, Jan},
  issn         = {1748-7188},
  journal      = {ALGORITHMS FOR MOLECULAR BIOLOGY},
  keyword      = {IBCN,ALGORITHMS,CONSENSUS,ASSEMBLIES,DE-BRUIJN GRAPHS,ACCURATE,GENOME,EXTREME VALUE THEORY,Maximal exact matches,de Bruijn graph,Error correction,Sequence analysis,ALIGNMENT,SIMULATOR},
  language     = {eng},
  pages        = {12},
  title        = {Jabba: hybrid error correction for long sequencing reads},
  url          = {http://dx.doi.org/10.1186/s13015-016-0075-7},
  volume       = {11},
  year         = {2016},
}

Chicago
Miclotte, Giles, Mahdi Heydari, Piet Demeester, Stephane Rombauts, Yves Van de Peer, Pieter Audenaert, and Jan Fostier. 2016. “Jabba: Hybrid Error Correction for Long Sequencing Reads.” Algorithms for Molecular Biology 11: 10.
APA
Miclotte, G., Heydari, M., Demeester, P., Rombauts, S., Van de Peer, Y., Audenaert, P., & Fostier, J. (2016). Jabba: hybrid error correction for long sequencing reads. ALGORITHMS FOR MOLECULAR BIOLOGY, 11, 10.
Vancouver
1.
Miclotte G, Heydari M, Demeester P, Rombauts S, Van de Peer Y, Audenaert P, et al. Jabba: hybrid error correction for long sequencing reads. ALGORITHMS FOR MOLECULAR BIOLOGY. 2016;11:10.
MLA
Miclotte, Giles, Mahdi Heydari, Piet Demeester, et al. “Jabba: Hybrid Error Correction for Long Sequencing Reads.” ALGORITHMS FOR MOLECULAR BIOLOGY 11 (2016): 10. Print.