Advanced search
1 file | 290.22 KB Add to list

Building a new-generation corpus for empirical translation studies : the Dutch Parallel Corpus 2.0

Ryan Reynaert, Lieve Macken (UGent) , Arda Tezcan (UGent) and Gert De Sutter (UGent)
Author
Organization
Abstract
This chapter introduces a new, updated version of the Dutch Parallel Corpus, a bidirectional parallel corpus of expert translations for Dutch><English and Dutch><French language pairs. This revisited version of the corpus, which we dub Dutch Parallel Corpus 2.0, is dynamic in nature, and contains 2.75 million words at the time of writing. The corpus is sentence-aligned, lemmatized and POS-tagged using the state-of-the-art natural language processing toolkit Stanza. Compared to its predecessor, the Dutch Parallel Corpus 2.0 contains more metadata about the translators (e.g. gender, education, experience) and the translation projects (e.g. L1/L2 translation, software used, degree and type of revision), next to the traditional metadata about the texts themselves (e.g. source and target language, intended audience, intended goal, register). The availability of an extensive set of metadata is considered the main asset of this corpus, together with a more principled and flexible register classification, thus stimulating corpus-based translation scholars to answer more refined research questions about the linguistic and contextual factors that shape translated texts, and ultimately fostering ideas and theories about the social and cognitive processes involved in translation performance. The corpus is freely available for research purposes via https://www.dpc2.ugent.be/.
Keywords
Parallel corpus, Corpus compilation, Linguistic annotation, Stanza, Universal PoS, Rich metadata, Register classification, eqtis, lt3

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 290.22 KB

Citation

Please use this url to cite or link to this publication:

MLA
Reynaert, Ryan, et al. “Building a New-Generation Corpus for Empirical Translation Studies : The Dutch Parallel Corpus 2.0.” New Perspectives on Corpus Translation Studies, edited by Vincent Wang et al., Springer, 2021, pp. 75–100, doi:10.1007/978-981-16-4918-9_4.
APA
Reynaert, R., Macken, L., Tezcan, A., & De Sutter, G. (2021). Building a new-generation corpus for empirical translation studies : the Dutch Parallel Corpus 2.0. In V. Wang, L. Lim, & D. Li (Eds.), New perspectives on corpus translation studies (pp. 75–100). https://doi.org/10.1007/978-981-16-4918-9_4
Chicago author-date
Reynaert, Ryan, Lieve Macken, Arda Tezcan, and Gert De Sutter. 2021. “Building a New-Generation Corpus for Empirical Translation Studies : The Dutch Parallel Corpus 2.0.” In New Perspectives on Corpus Translation Studies, edited by Vincent Wang, Lily Lim, and Defeng Li, 75–100. Singapore: Springer. https://doi.org/10.1007/978-981-16-4918-9_4.
Chicago author-date (all authors)
Reynaert, Ryan, Lieve Macken, Arda Tezcan, and Gert De Sutter. 2021. “Building a New-Generation Corpus for Empirical Translation Studies : The Dutch Parallel Corpus 2.0.” In New Perspectives on Corpus Translation Studies, ed by. Vincent Wang, Lily Lim, and Defeng Li, 75–100. Singapore: Springer. doi:10.1007/978-981-16-4918-9_4.
Vancouver
1.
Reynaert R, Macken L, Tezcan A, De Sutter G. Building a new-generation corpus for empirical translation studies : the Dutch Parallel Corpus 2.0. In: Wang V, Lim L, Li D, editors. New perspectives on corpus translation studies. Singapore: Springer; 2021. p. 75–100.
IEEE
[1]
R. Reynaert, L. Macken, A. Tezcan, and G. De Sutter, “Building a new-generation corpus for empirical translation studies : the Dutch Parallel Corpus 2.0,” in New perspectives on corpus translation studies, V. Wang, L. Lim, and D. Li, Eds. Singapore: Springer, 2021, pp. 75–100.
@incollection{8723179,
  abstract     = {{This chapter introduces a new, updated version of the Dutch Parallel Corpus, a bidirectional parallel corpus of expert translations for Dutch><English and Dutch><French language pairs. This revisited version of the corpus, which we dub Dutch Parallel Corpus 2.0, is dynamic in nature, and contains 2.75 million words at the time of writing. The corpus is sentence-aligned, lemmatized and POS-tagged using the state-of-the-art natural language processing toolkit Stanza. Compared to its predecessor, the Dutch Parallel Corpus 2.0 contains more metadata about the translators (e.g. gender, education, experience) and the translation projects (e.g. L1/L2 translation, software used, degree and type of revision), next to the traditional metadata about the texts themselves (e.g. source and target language, intended audience, intended goal, register). The availability of an extensive set of metadata is considered the main asset of this corpus, together with a more principled and flexible register classification, thus stimulating corpus-based translation scholars to answer more refined research questions about the linguistic and contextual factors that shape translated texts, and ultimately fostering ideas and theories about the social and cognitive processes involved in translation performance. The corpus is freely available for research purposes via https://www.dpc2.ugent.be/.}},
  author       = {{Reynaert, Ryan and Macken, Lieve and Tezcan, Arda and De Sutter, Gert}},
  booktitle    = {{New perspectives on corpus translation studies}},
  editor       = {{Wang, Vincent and Lim, Lily and Li, Defeng}},
  isbn         = {{9789811649172}},
  issn         = {{2197-8689}},
  keywords     = {{Parallel corpus,Corpus compilation,Linguistic annotation,Stanza,Universal PoS,Rich metadata,Register classification,eqtis,lt3}},
  language     = {{eng}},
  pages        = {{75--100}},
  publisher    = {{Springer}},
  series       = {{New Frontiers in Translation Studies}},
  title        = {{Building a new-generation corpus for empirical translation studies : the Dutch Parallel Corpus 2.0}},
  url          = {{http://dx.doi.org/10.1007/978-981-16-4918-9_4}},
  year         = {{2021}},
}

Altmetric
View in Altmetric