Advanced search
1 file | 409.50 KB Add to list

LeConTra : a learner corpus of English-to-Dutch news translation

Bram Vanroy (UGent) and Lieve Macken (UGent)
Author
Organization
Project
Abstract
We present LeConTra, a learner corpus consisting of English-to-Dutch news translations enriched with translation process data. Three students of a Master's programme in Translation were asked to translate 50 different English journalistic texts of approximately 250 tokens each. Because we also collected translation process data in the form of keystroke logging, our dataset can be used as part of different research strands such as translation process research, learner corpus research, and corpus-based translation studies. Reference translations, without process data, are also included. The data has been manually segmented and tokenized, and manually aligned at both segment and word level, leading to a high-quality corpus with token-level process data. The data is freely accessible via the Translation Process Research DataBase, which emphasises our commitment of distributing our dataset. The tool that was built for manual sentence segmentation and tokenization, Mantis, is also available as an open-source aid for data processing.
Keywords
datasets, learner corpus, translation process, translation studies, LT3

Downloads

  • 2022.lrec-1.192 1 .pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 409.50 KB

Citation

Please use this url to cite or link to this publication:

MLA
Vanroy, Bram, and Lieve Macken. “LeConTra : A Learner Corpus of English-to-Dutch News Translation.” LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, edited by Nicoletta Calzolari et al., European Language Resources Association (ELRA), 2022, pp. 1807–16.
APA
Vanroy, B., & Macken, L. (2022). LeConTra : a learner corpus of English-to-Dutch news translation. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, … S. Piperidis (Eds.), LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (pp. 1807–1816). Marseille, France: European Language Resources Association (ELRA).
Chicago author-date
Vanroy, Bram, and Lieve Macken. 2022. “LeConTra : A Learner Corpus of English-to-Dutch News Translation.” In LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, et al., 1807–16. Marseille, France: European Language Resources Association (ELRA).
Chicago author-date (all authors)
Vanroy, Bram, and Lieve Macken. 2022. “LeConTra : A Learner Corpus of English-to-Dutch News Translation.” In LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, ed by. Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélane Mazo, Jan Odijk, and Stelios Piperidis, 1807–1816. Marseille, France: European Language Resources Association (ELRA).
Vancouver
1.
Vanroy B, Macken L. LeConTra : a learner corpus of English-to-Dutch news translation. In: Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, et al., editors. LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION. Marseille, France: European Language Resources Association (ELRA); 2022. p. 1807–16.
IEEE
[1]
B. Vanroy and L. Macken, “LeConTra : a learner corpus of English-to-Dutch news translation,” in LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, Marseille, France, 2022, pp. 1807–1816.
@inproceedings{8756778,
  abstract     = {{We present LeConTra, a learner corpus consisting of English-to-Dutch news translations enriched with translation process data. Three students of a Master's programme in Translation were asked to translate 50 different English journalistic texts of approximately 250 tokens each. Because we also collected translation process data in the form of keystroke logging, our dataset can be used as part of different research strands such as translation process research, learner corpus research, and corpus-based translation studies. Reference translations, without process data, are also included. The data has been manually segmented and tokenized, and manually aligned at both segment and word level, leading to a high-quality corpus with token-level process data. The data is freely accessible via the Translation Process Research DataBase, which emphasises our commitment of distributing our dataset. The tool that was built for manual sentence segmentation and tokenization, Mantis, is also available as an open-source aid for data processing.}},
  author       = {{Vanroy, Bram and Macken, Lieve}},
  booktitle    = {{LREC 2022 : THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION}},
  editor       = {{Calzolari, Nicoletta and Béchet, Frédéric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, Hélane and Odijk, Jan and Piperidis, Stelios}},
  isbn         = {{9791095546726}},
  issn         = {{2522-2686}},
  keywords     = {{datasets,learner corpus,translation process,translation studies,LT3}},
  language     = {{eng}},
  location     = {{Marseille, France}},
  pages        = {{1807--1816}},
  publisher    = {{European Language Resources Association (ELRA)}},
  title        = {{LeConTra : a learner corpus of English-to-Dutch news translation}},
  url          = {{http://www.lrec-conf.org/proceedings/lrec2022/index.html}},
  year         = {{2022}},
}

Web of Science
Times cited: