Ghent University Academic Bibliography

Advanced

Annotating the Dutch parallel corpus

Hans Paulussen and Lieve Macken UGent (2010) NEALT Proceedings Series. 10. p.63-72
abstract
The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.
Please use this url to cite or link to this publication:
author
organization
year
type
conference
publication status
published
subject
keyword
linguistic annotation, parallel corpus
in
NEALT Proceedings Series
editor
Lars Ahrenberg, Jörg Tiedemann and Martin Volk
volume
10
pages
63 - 72
publisher
Northern European Association for Language Technology (NEALT)
place of publication
Tartu, Estonia
conference name
2010 Workshop on Annotation and Exploitation of Parallel Corpora (AEPC 2010)
conference location
Tartu, Estonia
conference start
2010-12-02
conference end
2010-12-02
ISSN
1736-6305
language
English
UGent publication?
yes
classification
C1
copyright statement
I have retained and own the full copyright for this publication
VABB id
c:vabb:356406
VABB type
VABB-5
id
1085253
handle
http://hdl.handle.net/1854/LU-1085253
date created
2010-12-08 16:30:50
date last changed
2017-01-02 09:52:19
@inproceedings{1085253,
  abstract     = {The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text samples aligned at sentence level. Next to sentence alignment, the corpus has also been grammatically annotated, thus improving exploitation for different domains, including natural language processing, translation research or CALL (computer-assisted language learning). In this paper, we describe the compilation of DPC and the alignment procedures used. This is followed by a description of the annotation task for the three languages, which required different tools and different tag sets. Finally the impact of different grammatical annotations on multilingual corpus exploitation is discussed.},
  author       = {Paulussen, Hans and Macken, Lieve},
  booktitle    = {NEALT Proceedings Series},
  editor       = {Ahrenberg, Lars  and Tiedemann, J{\"o}rg and Volk, Martin},
  issn         = {1736-6305},
  keyword      = {linguistic annotation,parallel corpus},
  language     = {eng},
  location     = {Tartu, Estonia},
  pages        = {63--72},
  publisher    = {Northern European Association for Language Technology (NEALT)},
  title        = {Annotating the Dutch parallel corpus},
  volume       = {10},
  year         = {2010},
}

Chicago
Paulussen, Hans, and Lieve Macken. 2010. “Annotating the Dutch Parallel Corpus.” In NEALT Proceedings Series, ed. Lars Ahrenberg, Jörg Tiedemann, and Martin Volk, 10:63–72. Tartu, Estonia: Northern European Association for Language Technology (NEALT).
APA
Paulussen, H., & Macken, L. (2010). Annotating the Dutch parallel corpus. In L. Ahrenberg, J. Tiedemann, & M. Volk (Eds.), NEALT Proceedings Series (Vol. 10, pp. 63–72). Presented at the 2010 Workshop on Annotation and Exploitation of Parallel Corpora (AEPC 2010), Tartu, Estonia: Northern European Association for Language Technology (NEALT).
Vancouver
1.
Paulussen H, Macken L. Annotating the Dutch parallel corpus. In: Ahrenberg L, Tiedemann J, Volk M, editors. NEALT Proceedings Series. Tartu, Estonia: Northern European Association for Language Technology (NEALT); 2010. p. 63–72.
MLA
Paulussen, Hans, and Lieve Macken. “Annotating the Dutch Parallel Corpus.” NEALT Proceedings Series. Ed. Lars Ahrenberg, Jörg Tiedemann, & Martin Volk. Vol. 10. Tartu, Estonia: Northern European Association for Language Technology (NEALT), 2010. 63–72. Print.