Advanced search
1 file | 75.83 KB

Annotating the Dutch parallel corpus

Author
Organization
Abstract
The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.
Keywords
linguistic annotation, parallel corpus

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 75.83 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Paulussen, Hans, and Lieve Macken. 2010. “Annotating the Dutch Parallel Corpus.” In NEALT Proceedings Series, ed. Lars Ahrenberg, Jörg Tiedemann, and Martin Volk, 10:63–72. Tartu, Estonia: Northern European Association for Language Technology (NEALT).
APA
Paulussen, H., & Macken, L. (2010). Annotating the Dutch parallel corpus. In L. Ahrenberg, J. Tiedemann, & M. Volk (Eds.), NEALT Proceedings Series (Vol. 10, pp. 63–72). Presented at the 2010 Workshop on Annotation and Exploitation of Parallel Corpora (AEPC 2010), Tartu, Estonia: Northern European Association for Language Technology (NEALT).
Vancouver
1.
Paulussen H, Macken L. Annotating the Dutch parallel corpus. In: Ahrenberg L, Tiedemann J, Volk M, editors. NEALT Proceedings Series. Tartu, Estonia: Northern European Association for Language Technology (NEALT); 2010. p. 63–72.
MLA
Paulussen, Hans, and Lieve Macken. “Annotating the Dutch Parallel Corpus.” NEALT Proceedings Series. Ed. Lars Ahrenberg, Jörg Tiedemann, & Martin Volk. Vol. 10. Tartu, Estonia: Northern European Association for Language Technology (NEALT), 2010. 63–72. Print.
@inproceedings{1085253,
  abstract     = {The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.},
  author       = {Paulussen, Hans and Macken, Lieve},
  booktitle    = {NEALT Proceedings Series},
  editor       = {Ahrenberg, Lars  and Tiedemann, J{\"o}rg and Volk, Martin},
  issn         = {1736-6305},
  keyword      = {linguistic annotation,parallel corpus},
  language     = {eng},
  location     = {Tartu, Estonia},
  pages        = {63--72},
  publisher    = {Northern European Association for Language Technology (NEALT)},
  title        = {Annotating the Dutch parallel corpus},
  volume       = {10},
  year         = {2010},
}