Ghent University Academic Bibliography

Advanced

Evaluating automatic cross-domain Dutch semantic role annotation

Orphée De Clercq UGent, Veronique Hoste UGent and Paola Monachesi (2012) LREC 2012 : eight international conference on language resources and evaluation. p.88-93
abstract
In this paper we present the first corpus where one million Dutch words from a variety of text genres have been annotated with semantic roles. 500K have been completely manually verified and used as training material to automatically label another 500K. All data has been annotated following an adapted version of the PropBank guidelines. The corpus’s rich text type diversity and the availability of manually verified syntactic dependency structures allowed us to experiment with an existing semantic role labeler for Dutch. In order to test the system’s portability across various domains, we experimented with training on individual domains and compared this with training on multiple domains by adding more data. Our results show that training on large data sets is necessary but that including genre-specific training material is also crucial to optimize classification. We observed that a small amount of in-domain training data is already sufficient to improve our semantic role labeler.
Please use this url to cite or link to this publication:
author
organization
year
type
conference
publication status
published
subject
keyword
cross-domain, semantic role labeling, corpus annotation
in
LREC 2012 : eight international conference on language resources and evaluation
editor
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk and Stelios Piperidis
pages
88 - 93
publisher
European Language Resources Association (ELRA)
place of publication
Paris, France
conference name
8th International conference on Language Resources and Evaluation (LREC 2012)
conference location
Istanbul, Turkey
conference start
2012-05-21
conference end
2012-05-27
Web of Science type
Proceedings Paper
Web of Science id
000323927700015
ISBN
9782951740877
language
English
UGent publication?
yes
classification
P1
copyright statement
I have transferred the copyright for this publication to the publisher
id
2129097
handle
http://hdl.handle.net/1854/LU-2129097
date created
2012-06-01 14:44:44
date last changed
2015-06-17 10:04:13
@inproceedings{2129097,
  abstract     = {In this paper we present the first corpus where one million Dutch words from a variety of text genres have been annotated with semantic roles. 500K have been completely manually verified and used as training material to automatically label another 500K. All data has been annotated following an adapted version of the PropBank guidelines. The corpus{\textquoteright}s rich text type diversity and the availability of manually verified syntactic dependency structures allowed us to experiment with an existing semantic role labeler for Dutch. In order to test the system{\textquoteright}s portability across various domains, we experimented with training on individual domains and compared this with training on multiple domains by adding more data. Our results show that training on large data sets is necessary but that including genre-specific training material is also crucial to optimize classification. We observed that a small amount of in-domain training data is already sufficient to improve our semantic role labeler.},
  author       = {De Clercq, Orph{\'e}e and Hoste, Veronique and Monachesi, Paola},
  booktitle    = {LREC 2012 : eight international conference on language resources and evaluation},
  editor       = {Calzolari, Nicoletta and Choukri, Khalid and Declerck, Thierry and U\u{g}ur Do\u{g}an, Mehmet and Maegaard, Bente and Mariani, Joseph and Odijk, Jan and Piperidis, Stelios},
  isbn         = {9782951740877},
  keyword      = {cross-domain,semantic role labeling,corpus annotation},
  language     = {eng},
  location     = {Istanbul, Turkey},
  pages        = {88--93},
  publisher    = {European Language Resources Association (ELRA)},
  title        = {Evaluating automatic cross-domain Dutch semantic role annotation},
  year         = {2012},
}

Chicago
De Clercq, Orphée, Veronique Hoste, and Paola Monachesi. 2012. “Evaluating Automatic Cross-domain Dutch Semantic Role Annotation.” In LREC 2012 : Eight International Conference on Language Resources and Evaluation, ed. Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, 88–93. Paris, France: European Language Resources Association (ELRA).
APA
De Clercq, O., Hoste, V., & Monachesi, P. (2012). Evaluating automatic cross-domain Dutch semantic role annotation. In N. Calzolari, K. Choukri, T. Declerck, M. Uğur Doğan, B. Maegaard, J. Mariani, J. Odijk, et al. (Eds.), LREC 2012 : eight international conference on language resources and evaluation (pp. 88–93). Presented at the 8th International conference on Language Resources and Evaluation (LREC 2012), Paris, France: European Language Resources Association (ELRA).
Vancouver
1.
De Clercq O, Hoste V, Monachesi P. Evaluating automatic cross-domain Dutch semantic role annotation. In: Calzolari N, Choukri K, Declerck T, Uğur Doğan M, Maegaard B, Mariani J, et al., editors. LREC 2012 : eight international conference on language resources and evaluation. Paris, France: European Language Resources Association (ELRA); 2012. p. 88–93.
MLA
De Clercq, Orphée, Veronique Hoste, and Paola Monachesi. “Evaluating Automatic Cross-domain Dutch Semantic Role Annotation.” LREC 2012 : Eight International Conference on Language Resources and Evaluation. Ed. Nicoletta Calzolari et al. Paris, France: European Language Resources Association (ELRA), 2012. 88–93. Print.