Ghent University Academic Bibliography

Advanced

From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification

Bram Slabbinck UGent, Willem Waegeman UGent, Peter Dawyndt UGent, Paul De Vos UGent and Bernard De Baets UGent (2010) BMC BIOINFORMATICS. 11(1).
abstract
Background: Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results: In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions: FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
TREE, TOOL, IDENTIFICATION, GENOMIC ERA, SPECIES DEFINITION, FATTY-ACID PROFILES, AD-HOC-COMMITTEE, FRAGMENTS
journal title
BMC BIOINFORMATICS
BMC Bioinformatics
volume
11
issue
1
Web of Science type
Article
Web of Science id
000275200300001
JCR category
MATHEMATICAL & COMPUTATIONAL BIOLOGY
JCR impact factor
3.028 (2010)
JCR rank
4/35 (2010)
JCR quartile
1 (2010)
ISSN
1471-2105
DOI
10.1186/1471-2105-11-69
language
English
UGent publication?
yes
classification
A1
additional info
article no. 69 (16 p.)
copyright statement
I have retained and own the full copyright for this publication
id
846610
handle
http://hdl.handle.net/1854/LU-846610
date created
2010-01-30 11:50:06
date last changed
2010-05-26 15:56:35
@article{846610,
  abstract     = {Background: Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification.
Results: In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model.
Conclusions: FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.},
  author       = {Slabbinck, Bram and Waegeman, Willem and Dawyndt, Peter and De Vos, Paul and De Baets, Bernard},
  issn         = {1471-2105},
  journal      = {BMC BIOINFORMATICS},
  keyword      = {TREE,TOOL,IDENTIFICATION,GENOMIC ERA,SPECIES DEFINITION,FATTY-ACID PROFILES,AD-HOC-COMMITTEE,FRAGMENTS},
  language     = {eng},
  number       = {1},
  title        = {From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification},
  url          = {http://dx.doi.org/10.1186/1471-2105-11-69},
  volume       = {11},
  year         = {2010},
}

Chicago
Slabbinck, Bram, Willem Waegeman, Peter Dawyndt, Paul De Vos, and Bernard De Baets. 2010. “From Learning Taxonomies to Phylogenetic Learning: Integration of 16S rRNA Gene Data into FAME-based Bacterial Classification.” Bmc Bioinformatics 11 (1).
APA
Slabbinck, B., Waegeman, W., Dawyndt, P., De Vos, P., & De Baets, B. (2010). From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification. BMC BIOINFORMATICS, 11(1).
Vancouver
1.
Slabbinck B, Waegeman W, Dawyndt P, De Vos P, De Baets B. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification. BMC BIOINFORMATICS. 2010;11(1).
MLA
Slabbinck, Bram, Willem Waegeman, Peter Dawyndt, et al. “From Learning Taxonomies to Phylogenetic Learning: Integration of 16S rRNA Gene Data into FAME-based Bacterial Classification.” BMC BIOINFORMATICS 11.1 (2010): n. pag. Print.