Advanced search
2 files | 1.07 MB Add to list

Tagging terms in text : a supervised sequential labelling approach to automatic term extraction

Ayla Rigouts Terryn (UGent) , Veronique Hoste (UGent) and Els Lefever (UGent)
(2022) TERMINOLOGY. 28(1). p.157-189
Author
Organization
Project
Abstract
As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.
Keywords
lt3, ATE, automatic term extraction, terminology, sequential labelling, RECOGNITION

Downloads

  • Terminology sequentialATE final.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 553.89 KB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 519.70 KB

Citation

Please use this url to cite or link to this publication:

MLA
Rigouts Terryn, Ayla, et al. “Tagging Terms in Text : A Supervised Sequential Labelling Approach to Automatic Term Extraction.” TERMINOLOGY, vol. 28, no. 1, 2022, pp. 157–89, doi:10.1075/term.21010.rig.
APA
Rigouts Terryn, A., Hoste, V., & Lefever, E. (2022). Tagging terms in text : a supervised sequential labelling approach to automatic term extraction. TERMINOLOGY, 28(1), 157–189. https://doi.org/10.1075/term.21010.rig
Chicago author-date
Rigouts Terryn, Ayla, Veronique Hoste, and Els Lefever. 2022. “Tagging Terms in Text : A Supervised Sequential Labelling Approach to Automatic Term Extraction.” TERMINOLOGY 28 (1): 157–89. https://doi.org/10.1075/term.21010.rig.
Chicago author-date (all authors)
Rigouts Terryn, Ayla, Veronique Hoste, and Els Lefever. 2022. “Tagging Terms in Text : A Supervised Sequential Labelling Approach to Automatic Term Extraction.” TERMINOLOGY 28 (1): 157–189. doi:10.1075/term.21010.rig.
Vancouver
1.
Rigouts Terryn A, Hoste V, Lefever E. Tagging terms in text : a supervised sequential labelling approach to automatic term extraction. TERMINOLOGY. 2022;28(1):157–89.
IEEE
[1]
A. Rigouts Terryn, V. Hoste, and E. Lefever, “Tagging terms in text : a supervised sequential labelling approach to automatic term extraction,” TERMINOLOGY, vol. 28, no. 1, pp. 157–189, 2022.
@article{8725984,
  abstract     = {{As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.}},
  author       = {{Rigouts Terryn, Ayla and Hoste, Veronique and Lefever, Els}},
  issn         = {{0929-9971}},
  journal      = {{TERMINOLOGY}},
  keywords     = {{lt3,ATE,automatic term extraction,terminology,sequential labelling,RECOGNITION}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{157--189}},
  title        = {{Tagging terms in text : a supervised sequential labelling approach to automatic term extraction}},
  url          = {{http://doi.org/10.1075/term.21010.rig}},
  volume       = {{28}},
  year         = {{2022}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: