Advanced search
1 file | 8.51 MB Add to list

PaSiT : a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing

(2020) BIOINFORMATICS. 36(8). p.2337-2344
Author
Organization
Abstract
Motivation: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances. Results: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses.
Keywords
Statistics and Probability, Computational Theory and Mathematics, Biochemistry, Molecular Biology, Computational Mathematics, Computer Science Applications, SPECIES CONCEPT, SEQUENCE, DNA, ALGORITHM

Downloads

  • 2020-PaSiT.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 8.51 MB

Citation

Please use this url to cite or link to this publication:

MLA
Goussarov, Gleb, et al. “PaSiT : A Novel Approach Based on Short-Oligonucleotide Frequencies for Efficient Bacterial Identification and Typing.” BIOINFORMATICS, vol. 36, no. 8, 2020, pp. 2337–44, doi:10.1093/bioinformatics/btz964.
APA
Goussarov, G., Cleenwerck, I., Mysara, M., Leys, N., Monsieurs, P., Tahon, G., … Van Houdt, R. (2020). PaSiT : a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. BIOINFORMATICS, 36(8), 2337–2344. https://doi.org/10.1093/bioinformatics/btz964
Chicago author-date
Goussarov, Gleb, Ilse Cleenwerck, Mohamed Mysara, Natalie Leys, Pieter Monsieurs, Guillaume Tahon, Aurélien Carlier, Peter Vandamme, and Rob Van Houdt. 2020. “PaSiT : A Novel Approach Based on Short-Oligonucleotide Frequencies for Efficient Bacterial Identification and Typing.” BIOINFORMATICS 36 (8): 2337–44. https://doi.org/10.1093/bioinformatics/btz964.
Chicago author-date (all authors)
Goussarov, Gleb, Ilse Cleenwerck, Mohamed Mysara, Natalie Leys, Pieter Monsieurs, Guillaume Tahon, Aurélien Carlier, Peter Vandamme, and Rob Van Houdt. 2020. “PaSiT : A Novel Approach Based on Short-Oligonucleotide Frequencies for Efficient Bacterial Identification and Typing.” BIOINFORMATICS 36 (8): 2337–2344. doi:10.1093/bioinformatics/btz964.
Vancouver
1.
Goussarov G, Cleenwerck I, Mysara M, Leys N, Monsieurs P, Tahon G, et al. PaSiT : a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing. BIOINFORMATICS. 2020;36(8):2337–44.
IEEE
[1]
G. Goussarov et al., “PaSiT : a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing,” BIOINFORMATICS, vol. 36, no. 8, pp. 2337–2344, 2020.
@article{8663285,
  abstract     = {Motivation: One of the most widespread methods used in taxonomy studies to distinguish between strains or taxa is the calculation of average nucleotide identity. It requires a computationally expensive alignment step and is therefore not suitable for large-scale comparisons. Short oligonucleotide-based methods do offer a faster alternative but at the expense of accuracy. Here, we aim to address this shortcoming by providing a software that implements a novel method based on short-oligonucleotide frequencies to compute inter-genomic distances.

Results: Our tetranucleotide and hexanucleotide implementations, which were optimized based on a taxonomically well-defined set of over 200 newly sequenced bacterial genomes, are as accurate as the short oligonucleotide-based method TETRA and average nucleotide identity, for identifying bacterial species and strains, respectively. Moreover, the lightweight nature of this method makes it applicable for large-scale analyses.},
  author       = {Goussarov, Gleb and Cleenwerck, Ilse and Mysara, Mohamed and Leys, Natalie and Monsieurs, Pieter and Tahon, Guillaume and Carlier, Aurélien and Vandamme, Peter and Van Houdt, Rob},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  keywords     = {Statistics and Probability,Computational Theory and Mathematics,Biochemistry,Molecular Biology,Computational Mathematics,Computer Science Applications,SPECIES CONCEPT,SEQUENCE,DNA,ALGORITHM},
  language     = {eng},
  number       = {8},
  pages        = {2337--2344},
  title        = {PaSiT : a novel approach based on short-oligonucleotide frequencies for efficient bacterial identification and typing},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btz964},
  volume       = {36},
  year         = {2020},
}

Altmetric
View in Altmetric
Web of Science
Times cited: