Annotating the Dutch parallel corpus

Paulussen, Hans; Macken, Lieve

Annotating the Dutch parallel corpus

Hans Paulussen and Lieve Macken (UGent)

(2010) NEALT Proceedings Series. 10. p.63-72

Author

Hans Paulussen and Lieve Macken (UGent)

Organization

Abstract

The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.

Keywords

linguistic annotation, parallel corpus

Downloads

Download

(...).pdf
- full text
- |
- UGent only
- |
- PDF
- |
- 75.83 KB

Citation

Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-1085253

MLA: Paulussen, Hans, and Lieve Macken. “Annotating the Dutch Parallel Corpus.” NEALT Proceedings Series, edited by Lars Ahrenberg et al., vol. 10, Northern European Association for Language Technology (NEALT), 2010, pp. 63–72.
APA: Paulussen, H., & Macken, L. (2010). Annotating the Dutch parallel corpus. In L. Ahrenberg, J. Tiedemann, & M. Volk (Eds.), NEALT Proceedings Series (Vol. 10, pp. 63–72). Tartu, Estonia: Northern European Association for Language Technology (NEALT).
Chicago author-date: Paulussen, Hans, and Lieve Macken. 2010. “Annotating the Dutch Parallel Corpus.” In NEALT Proceedings Series, edited by Lars Ahrenberg, Jörg Tiedemann, and Martin Volk, 10:63–72. Tartu, Estonia: Northern European Association for Language Technology (NEALT).
Chicago author-date (all authors): Paulussen, Hans, and Lieve Macken. 2010. “Annotating the Dutch Parallel Corpus.” In NEALT Proceedings Series, ed by. Lars Ahrenberg, Jörg Tiedemann, and Martin Volk, 10:63–72. Tartu, Estonia: Northern European Association for Language Technology (NEALT).
Vancouver: 1.
Paulussen H, Macken L. Annotating the Dutch parallel corpus. In: Ahrenberg L, Tiedemann J, Volk M, editors. NEALT Proceedings Series. Tartu, Estonia: Northern European Association for Language Technology (NEALT); 2010. p. 63–72.
IEEE: [1]
H. Paulussen and L. Macken, “Annotating the Dutch parallel corpus,” in NEALT Proceedings Series, Tartu, Estonia, 2010, vol. 10, pp. 63–72.

@inproceedings{1085253,
  abstract     = {{The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.}},
  author       = {{Paulussen, Hans and Macken, Lieve}},
  booktitle    = {{NEALT Proceedings Series}},
  editor       = {{Ahrenberg, Lars and Tiedemann, Jörg and Volk, Martin}},
  issn         = {{1736-6305}},
  keywords     = {{linguistic annotation,parallel corpus}},
  language     = {{eng}},
  location     = {{Tartu, Estonia}},
  pages        = {{63--72}},
  publisher    = {{Northern European Association for Language Technology (NEALT)}},
  title        = {{Annotating the Dutch parallel corpus}},
  volume       = {{10}},
  year         = {{2010}},
}

Academic Bibliography

Annotating the Dutch parallel corpus

Downloads

Citation

Contents

Support

Contact