Advanced search
1 file | 354.31 KB

Discriminative and informative features for biomolecular text mining with ensemble feature selection

Sofie Van Landeghem (UGent) , Thomas Abeel (UGent) , Yvan Saeys (UGent) and Yves Van de Peer (UGent)
(2010) BIOINFORMATICS. 26(18). p.i554-i560
Author
Organization
Abstract
Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. Results: We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. Availability: The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).

Downloads

  • Van Landeghem et al. 2010 Bioinformatics26 i554.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 354.31 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Van Landeghem, Sofie, Thomas Abeel, Yvan Saeys, and Yves Van de Peer. 2010. “Discriminative and Informative Features for Biomolecular Text Mining with Ensemble Feature Selection.” Bioinformatics 26 (18): i554–i560.
APA
Van Landeghem, S., Abeel, T., Saeys, Y., & Van de Peer, Y. (2010). Discriminative and informative features for biomolecular text mining with ensemble feature selection. BIOINFORMATICS, 26(18), i554–i560. Presented at the 9th European Conference on Computational Biology.
Vancouver
1.
Van Landeghem S, Abeel T, Saeys Y, Van de Peer Y. Discriminative and informative features for biomolecular text mining with ensemble feature selection. BIOINFORMATICS. 2010;26(18):i554–i560.
MLA
Van Landeghem, Sofie, Thomas Abeel, Yvan Saeys, et al. “Discriminative and Informative Features for Biomolecular Text Mining with Ensemble Feature Selection.” BIOINFORMATICS 26.18 (2010): i554–i560. Print.
@article{1061956,
  abstract     = {Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results.
Results: We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools.
Availability: The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).},
  author       = {Van Landeghem, Sofie and Abeel, Thomas and Saeys, Yvan and Van de Peer, Yves},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  language     = {eng},
  location     = {Ghent, Belgium},
  number       = {18},
  pages        = {i554--i560},
  title        = {Discriminative and informative features for biomolecular text mining with ensemble feature selection},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btq381},
  volume       = {26},
  year         = {2010},
}

Altmetric
View in Altmetric
Web of Science
Times cited: