Advanced search
1 file | 75.83 KB Add to list

Annotating the Dutch parallel corpus

Author
Organization
Abstract
The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.
Keywords
linguistic annotation, parallel corpus

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 75.83 KB

Citation

Please use this url to cite or link to this publication:

MLA
Paulussen, Hans, and Lieve Macken. “Annotating the Dutch Parallel Corpus.” NEALT Proceedings Series, edited by Lars Ahrenberg et al., vol. 10, Northern European Association for Language Technology (NEALT), 2010, pp. 63–72.
APA
Paulussen, H., & Macken, L. (2010). Annotating the Dutch parallel corpus. In L. Ahrenberg, J. Tiedemann, & M. Volk (Eds.), NEALT Proceedings Series (Vol. 10, pp. 63–72). Tartu, Estonia: Northern European Association for Language Technology (NEALT).
Chicago author-date
Paulussen, Hans, and Lieve Macken. 2010. “Annotating the Dutch Parallel Corpus.” In NEALT Proceedings Series, edited by Lars Ahrenberg, Jörg Tiedemann, and Martin Volk, 10:63–72. Tartu, Estonia: Northern European Association for Language Technology (NEALT).
Chicago author-date (all authors)
Paulussen, Hans, and Lieve Macken. 2010. “Annotating the Dutch Parallel Corpus.” In NEALT Proceedings Series, ed by. Lars Ahrenberg, Jörg Tiedemann, and Martin Volk, 10:63–72. Tartu, Estonia: Northern European Association for Language Technology (NEALT).
Vancouver
1.
Paulussen H, Macken L. Annotating the Dutch parallel corpus. In: Ahrenberg L, Tiedemann J, Volk M, editors. NEALT Proceedings Series. Tartu, Estonia: Northern European Association for Language Technology (NEALT); 2010. p. 63–72.
IEEE
[1]
H. Paulussen and L. Macken, “Annotating the Dutch parallel corpus,” in NEALT Proceedings Series, Tartu, Estonia, 2010, vol. 10, pp. 63–72.
@inproceedings{1085253,
  abstract     = {{The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.}},
  author       = {{Paulussen, Hans and Macken, Lieve}},
  booktitle    = {{NEALT Proceedings Series}},
  editor       = {{Ahrenberg, Lars and Tiedemann, Jörg and Volk, Martin}},
  issn         = {{1736-6305}},
  keywords     = {{linguistic annotation,parallel corpus}},
  language     = {{eng}},
  location     = {{Tartu, Estonia}},
  pages        = {{63--72}},
  publisher    = {{Northern European Association for Language Technology (NEALT)}},
  title        = {{Annotating the Dutch parallel corpus}},
  volume       = {{10}},
  year         = {{2010}},
}