Advanced search
1 file | 185.68 KB

PubMed-scale event extraction for post-translational modifications, epigenetics and protein structural relations

Author
Organization
Abstract
Recent efforts in biomolecular event extraction have mainly focused on core event types involving genes and proteins, such as gene expression, protein-protein interactions, and protein catabolism. The BioNLP’11 Shared Task extended the event extraction approach to sub-protein events and relations in the Epigenetics and Post-translational Modifications (EPI) and Protein Relations (REL) tasks. In this study, we apply the Turku Event Extraction System, the best-performing system for these tasks, to all PubMed abstracts and all available PMC full-text articles, extracting 1.4M EPI events and 2.2M REL relations from 21M abstracts and 372K articles. We introduce several entity normalization algorithms for genes, proteins, protein complexes and protein components, aiming to uniquely identify these biological entities. This normalization effort allows direct mapping of the extracted events and relations with posttranslational modifications from UniProt, epigenetics from PubMeth, functional domains from InterPro and macromolecular structures from PDB. The extraction of such detailed protein information provides a unique text mining dataset, offering the opportunity to further deepen the information provided by existing PubMed-scale event extraction efforts. The methods and data introduced in this study are freely available from bionlp.utu.fi

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 185.68 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Björne, Jari, Sofie Van Landeghem, Sampo Pyysalo, Tomoko Ohta, Filip Ginter, Yves Van de Peer, Sophia Ananiadou, and Tapio Salakoski. 2012. “PubMed-scale Event Extraction for Post-translational Modifications, Epigenetics and Protein Structural Relations.” In Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, 82–90. Association for Computational Linguistics (ACL).
APA
Björne, J., Van Landeghem, S., Pyysalo, S., Ohta, T., Ginter, F., Van de Peer, Y., Ananiadou, S., et al. (2012). PubMed-scale event extraction for post-translational modifications, epigenetics and protein structural relations. Proceedings of the 2012 workshop on biomedical natural language processing (pp. 82–90). Presented at the 2012 Workshop on Biomedical Natural Language Processing (BioNLP 2012), Association for Computational Linguistics (ACL).
Vancouver
1.
Björne J, Van Landeghem S, Pyysalo S, Ohta T, Ginter F, Van de Peer Y, et al. PubMed-scale event extraction for post-translational modifications, epigenetics and protein structural relations. Proceedings of the 2012 workshop on biomedical natural language processing. Association for Computational Linguistics (ACL); 2012. p. 82–90.
MLA
Björne, Jari, Sofie Van Landeghem, Sampo Pyysalo, et al. “PubMed-scale Event Extraction for Post-translational Modifications, Epigenetics and Protein Structural Relations.” Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics (ACL), 2012. 82–90. Print.
@inproceedings{2976876,
  abstract     = {Recent efforts in biomolecular event extraction have mainly focused on core event types involving genes and proteins, such as gene expression, protein-protein interactions, and protein catabolism. The BioNLP’11 Shared Task extended the event extraction approach to sub-protein events and relations in the Epigenetics and Post-translational Modifications (EPI) and Protein Relations (REL) tasks. In this study, we apply the Turku Event Extraction System, the best-performing system for these tasks, to all PubMed abstracts and all available PMC full-text articles, extracting 1.4M EPI events and 2.2M REL relations from 21M abstracts and 372K articles. We introduce several entity normalization algorithms for genes, proteins, protein complexes and protein components, aiming to uniquely identify these biological entities. This normalization effort allows direct mapping of the extracted events and relations with posttranslational modifications from UniProt, epigenetics from PubMeth, functional domains from InterPro and macromolecular structures from PDB. The extraction of such detailed protein information provides a unique text mining dataset, offering the opportunity to further deepen the information provided by existing PubMed-scale event extraction efforts. The methods and data introduced in this study are freely available from bionlp.utu.fi},
  author       = {Björne, Jari and Van Landeghem, Sofie and Pyysalo, Sampo and Ohta, Tomoko and Ginter, Filip and Van de Peer, Yves and Ananiadou, Sophia and Salakoski, Tapio},
  booktitle    = {Proceedings of the 2012 workshop on biomedical natural language processing},
  language     = {eng},
  location     = {Montréal, QU, Canada},
  pages        = {82--90},
  publisher    = {Association for Computational Linguistics (ACL)},
  title        = {PubMed-scale event extraction for post-translational modifications, epigenetics and protein structural relations},
  year         = {2012},
}