Advanced search
1 file | 548.25 KB

Semantically linking molecular entities in literature through entity relationships

Author
Organization
Project
Bioinformatics: from nucleotids to networks (N2N)
Abstract
Background: Text mining tools have gained popularity to process the vast amount of available research articles in the biomedical literature. It is crucial that such tools extract information with a sufficient level of detail to be applicable in real life scenarios. Studies of mining non-causal molecular relations attribute to this goal by formally identifying the relations between genes, promoters, complexes and various other molecular entities found in text. More importantly, these studies help to enhance integration of text mining results with database facts. Results: We describe, compare and evaluate two frameworks developed for the prediction of non-causal or 'entity' relations (REL) between gene symbols and domain terms. For the corresponding REL challenge of the BioNLP Shared Task of 2011, these systems ranked first (57.7% F-score) and second (41.6% F-score). In this paper, we investigate the performance discrepancy of 16 percentage points by benchmarking on a related and more extensive dataset, analysing the contribution of both the term detection and relation extraction modules. We further construct a hybrid system combining the two frameworks and experiment with intersection and union combinations, achieving respectively high-precision and high-recall results. Finally, we highlight extremely high-performance results (F-score >90%) obtained for the specific subclass of embedded entity relations that are essential for integrating text mining predictions with database facts. Conclusions: The results from this study will enable us in the near future to annotate semantic relations between molecular entities in the entire scientific literature available through PubMed. The recent release of the EVEX dataset, containing biomolecular event predictions for millions of PubMed articles, is an interesting and exciting opportunity to overlay these entity relations with event predictions on a literature-wide scale.
Keywords
CORPUS, FEATURES, EXTRACTION

Downloads

  • Van Landeghem et al. 2012 BMC Bioinformatics 13 S6.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 548.25 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Van Landeghem, Sofie, Jari Björne, Thomas Abeel, Bernard De Baets, Tapio Salakoski, and Yves Van de Peer. 2012. “Semantically Linking Molecular Entities in Literature Through Entity Relationships.” Bmc Bioinformatics 13.
APA
Van Landeghem, S., Björne, J., Abeel, T., De Baets, B., Salakoski, T., & Van de Peer, Y. (2012). Semantically linking molecular entities in literature through entity relationships. BMC BIOINFORMATICS, 13. Presented at the Conference on BioNLP Shared Task.
Vancouver
1.
Van Landeghem S, Björne J, Abeel T, De Baets B, Salakoski T, Van de Peer Y. Semantically linking molecular entities in literature through entity relationships. BMC BIOINFORMATICS. 2012;13.
MLA
Van Landeghem, Sofie et al. “Semantically Linking Molecular Entities in Literature Through Entity Relationships.” BMC BIOINFORMATICS 13 (2012): n. pag. Print.
@article{2974138,
  abstract     = {Background: Text mining tools have gained popularity to process the vast amount of available research articles in the biomedical literature. It is crucial that such tools extract information with a sufficient level of detail to be applicable in real life scenarios. Studies of mining non-causal molecular relations attribute to this goal by formally identifying the relations between genes, promoters, complexes and various other molecular entities found in text. More importantly, these studies help to enhance integration of text mining results with database facts.
Results: We describe, compare and evaluate two frameworks developed for the prediction of non-causal or 'entity' relations (REL) between gene symbols and domain terms. For the corresponding REL challenge of the BioNLP Shared Task of 2011, these systems ranked first (57.7\% F-score) and second (41.6\% F-score). In this paper, we investigate the performance discrepancy of 16 percentage points by benchmarking on a related and more extensive dataset, analysing the contribution of both the term detection and relation extraction modules. We further construct a hybrid system combining the two frameworks and experiment with intersection and union combinations, achieving respectively high-precision and high-recall results. Finally, we highlight extremely high-performance results (F-score {\textrangle}90\%) obtained for the specific subclass of embedded entity relations that are essential for integrating text mining predictions with database facts.
Conclusions: The results from this study will enable us in the near future to annotate semantic relations between molecular entities in the entire scientific literature available through PubMed. The recent release of the EVEX dataset, containing biomolecular event predictions for millions of PubMed articles, is an interesting and exciting opportunity to overlay these entity relations with event predictions on a literature-wide scale.},
  articleno    = {S6},
  author       = {Van Landeghem, Sofie and Bj{\"o}rne, Jari and Abeel, Thomas and De Baets, Bernard and Salakoski, Tapio and Van de Peer, Yves},
  issn         = {1471-2105},
  journal      = {BMC BIOINFORMATICS},
  language     = {eng},
  location     = {Portland, OR, USA},
  pages        = {9},
  title        = {Semantically linking molecular entities in literature through entity relationships},
  url          = {http://dx.doi.org/10.1186/1471-2105-13-S11-S6},
  volume       = {13},
  year         = {2012},
}

Altmetric
View in Altmetric
Web of Science
Times cited: