Advanced search
2 files | 1.80 MB Add to list

Constructing a cross-document event coreference corpus for Dutch

Loic De Langhe (UGent) , Orphée De Clercq (UGent) and Veronique Hoste (UGent)
Author
Organization
Project
Abstract
Event coreference resolution is a task in which different text fragments that refer to the same real-world event are automatically linked together. This task can be performed not only within a single document but also across different documents and can serve as a basis for many useful Natural Language Processing applications. Resources for this type of research, however, are extremely limited. We compiled the first large-scale dataset for cross-document event coreference resolution in Dutch, comparable in size to the most widely used English event coreference corpora. As data for event coreference is notoriously sparse, we took additional steps to maximize the number of coreference links in our corpus. Due to the complex nature of event coreference resolution, many algorithms consist of pipeline architectures which rely on a series of upstream tasks such as event detection, event argument identification and argument coreference. We tackle the task of event argument coreference to both illustrate the potential of our compiled corpus and to lay the groundwork for a Dutch event coreference resolution system in the future. Results show that existing NLP algorithms can be easily retrofitted to contribute to the subtasks of an event coreference resolution pipeline system.
Keywords
Event coreference resolution, Event annotation, Entity coreference, resolution

Downloads

  • Constructing a Cross Document Event Coreference Corpus for Dutch-2.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 556.24 KB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.24 MB

Citation

Please use this url to cite or link to this publication:

MLA
De Langhe, Loic, et al. “Constructing a Cross-Document Event Coreference Corpus for Dutch.” LANGUAGE RESOURCES AND EVALUATION, vol. 57, no. 2, 2023, pp. 819–48, doi:10.1007/s10579-022-09597-1.
APA
De Langhe, L., De Clercq, O., & Hoste, V. (2023). Constructing a cross-document event coreference corpus for Dutch. LANGUAGE RESOURCES AND EVALUATION, 57(2), 819–848. https://doi.org/10.1007/s10579-022-09597-1
Chicago author-date
De Langhe, Loic, Orphée De Clercq, and Veronique Hoste. 2023. “Constructing a Cross-Document Event Coreference Corpus for Dutch.” LANGUAGE RESOURCES AND EVALUATION 57 (2): 819–48. https://doi.org/10.1007/s10579-022-09597-1.
Chicago author-date (all authors)
De Langhe, Loic, Orphée De Clercq, and Veronique Hoste. 2023. “Constructing a Cross-Document Event Coreference Corpus for Dutch.” LANGUAGE RESOURCES AND EVALUATION 57 (2): 819–848. doi:10.1007/s10579-022-09597-1.
Vancouver
1.
De Langhe L, De Clercq O, Hoste V. Constructing a cross-document event coreference corpus for Dutch. LANGUAGE RESOURCES AND EVALUATION. 2023;57(2):819–48.
IEEE
[1]
L. De Langhe, O. De Clercq, and V. Hoste, “Constructing a cross-document event coreference corpus for Dutch,” LANGUAGE RESOURCES AND EVALUATION, vol. 57, no. 2, pp. 819–848, 2023.
@article{8755851,
  abstract     = {{Event coreference resolution is a task in which different text fragments that refer to the same real-world event are automatically linked together. This task can be performed not only within a single document but also across different documents and can serve as a basis for many useful Natural Language Processing applications. Resources for this type of research, however, are extremely limited. We compiled the first large-scale dataset for cross-document event coreference resolution in Dutch, comparable in size to the most widely used English event coreference corpora. As data for event coreference is notoriously sparse, we took additional steps to maximize the number of coreference links in our corpus. Due to the complex nature of event coreference resolution, many algorithms consist of pipeline architectures which rely on a series of upstream tasks such as event detection, event argument identification and argument coreference. We tackle the task of event argument coreference to both illustrate the potential of our compiled corpus and to lay the groundwork for a Dutch event coreference resolution system in the future. Results show that existing NLP algorithms can be easily retrofitted to contribute to the subtasks of an event coreference resolution pipeline system.}},
  author       = {{De Langhe, Loic and De Clercq, Orphée and Hoste, Veronique}},
  issn         = {{1574-020X}},
  journal      = {{LANGUAGE RESOURCES AND EVALUATION}},
  keywords     = {{Event coreference resolution,Event annotation,Entity coreference,resolution}},
  language     = {{eng}},
  number       = {{2}},
  pages        = {{819--848}},
  title        = {{Constructing a cross-document event coreference corpus for Dutch}},
  url          = {{http://doi.org/10.1007/s10579-022-09597-1}},
  volume       = {{57}},
  year         = {{2023}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: