Ghent University Academic Bibliography

Advanced

Creation and evaluation of large keyphrase extraction collections with multiple opinions

Lucas Sterckx UGent, Thomas Demeester UGent, Johannes Deleu UGent and Chris Develder UGent (2018) LANGUAGE RESOURCES AND EVALUATION. 52(2). p.503-532
abstract
While several automatic keyphrase extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
keyword
Automatic keyphrase extraction, Test collections, Annotator disagreement
journal title
LANGUAGE RESOURCES AND EVALUATION
Lang. Resour. Eval.
volume
52
issue
2
pages
30 pages
publisher
Springer
place of publication
Dordrecht
Web of Science type
Article
Web of Science id
000432144400006
ISSN
1574-020X
1574-0218
DOI
10.1007/s10579-017-9395-6
language
English
UGent publication?
yes
classification
A1
id
8563585
handle
http://hdl.handle.net/1854/LU-8563585
date created
2018-05-31 07:00:37
date last changed
2018-07-10 06:35:33
@article{8563585,
  abstract     = {While several automatic keyphrase extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors.},
  author       = {Sterckx, Lucas and Demeester, Thomas and Deleu, Johannes and Develder, Chris},
  issn         = {1574-020X},
  journal      = {LANGUAGE RESOURCES AND EVALUATION},
  keyword      = {Automatic keyphrase extraction,Test collections,Annotator disagreement},
  language     = {eng},
  number       = {2},
  pages        = {503--532},
  publisher    = {Springer},
  title        = {Creation and evaluation of large keyphrase extraction collections with multiple opinions},
  url          = {http://dx.doi.org/10.1007/s10579-017-9395-6},
  volume       = {52},
  year         = {2018},
}

Chicago
Sterckx, Lucas, Thomas Demeester, Johannes Deleu, and Chris Develder. 2018. “Creation and Evaluation of Large Keyphrase Extraction Collections with Multiple Opinions.” Language Resources and Evaluation 52 (2): 503–532.
APA
Sterckx, L., Demeester, T., Deleu, J., & Develder, C. (2018). Creation and evaluation of large keyphrase extraction collections with multiple opinions. LANGUAGE RESOURCES AND EVALUATION, 52(2), 503–532.
Vancouver
1.
Sterckx L, Demeester T, Deleu J, Develder C. Creation and evaluation of large keyphrase extraction collections with multiple opinions. LANGUAGE RESOURCES AND EVALUATION. Dordrecht: Springer; 2018;52(2):503–32.
MLA
Sterckx, Lucas, Thomas Demeester, Johannes Deleu, et al. “Creation and Evaluation of Large Keyphrase Extraction Collections with Multiple Opinions.” LANGUAGE RESOURCES AND EVALUATION 52.2 (2018): 503–532. Print.