Advanced search
2 files | 4.78 MB

Creation and evaluation of large keyphrase extraction collections with multiple opinions

Lucas Sterckx (UGent) , Thomas Demeester (UGent) , Johannes Deleu (UGent) and Chris Develder (UGent)
Author
Organization
Abstract
While several automatic keyphrase extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors.
Keywords
Automatic keyphrase extraction, Test collections, Annotator disagreement

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 2.56 MB
  • 6967 i.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 2.21 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Sterckx, Lucas, Thomas Demeester, Johannes Deleu, and Chris Develder. 2018. “Creation and Evaluation of Large Keyphrase Extraction Collections with Multiple Opinions.” Language Resources and Evaluation 52 (2): 503–532.
APA
Sterckx, L., Demeester, T., Deleu, J., & Develder, C. (2018). Creation and evaluation of large keyphrase extraction collections with multiple opinions. LANGUAGE RESOURCES AND EVALUATION, 52(2), 503–532.
Vancouver
1.
Sterckx L, Demeester T, Deleu J, Develder C. Creation and evaluation of large keyphrase extraction collections with multiple opinions. LANGUAGE RESOURCES AND EVALUATION. Dordrecht: Springer; 2018;52(2):503–32.
MLA
Sterckx, Lucas, Thomas Demeester, Johannes Deleu, et al. “Creation and Evaluation of Large Keyphrase Extraction Collections with Multiple Opinions.” LANGUAGE RESOURCES AND EVALUATION 52.2 (2018): 503–532. Print.
@article{8563585,
  abstract     = {While several automatic keyphrase extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors.},
  author       = {Sterckx, Lucas and Demeester, Thomas and Deleu, Johannes and Develder, Chris},
  issn         = {1574-020X},
  journal      = {LANGUAGE RESOURCES AND EVALUATION},
  keyword      = {Automatic keyphrase extraction,Test collections,Annotator disagreement},
  language     = {eng},
  number       = {2},
  pages        = {503--532},
  publisher    = {Springer},
  title        = {Creation and evaluation of large keyphrase extraction collections with multiple opinions},
  url          = {http://dx.doi.org/10.1007/s10579-017-9395-6},
  volume       = {52},
  year         = {2018},
}

Altmetric
View in Altmetric
Web of Science
Times cited: