Advanced search
1 file | 391.92 KB

A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch

Laura Van Brussel (UGent) , Arda Tezcan (UGent) and Lieve Macken (UGent)
Author
Organization
Abstract
This paper presents a fine-grained error comparison of the English-to-Dutch translations of a commercial neural, phrase-based and rule-based machine translation (MT) system. For phrase-based and rule-based machine translation, we make use of the annotated SCATE corpus of MT errors, enriching it with the annotation of neural MT errors and updating the SCATE error taxonomy to fit the neural MT output as well. Neural, in general, outperforms phrase-based and rule-based systems especially for fluency, except for lexical issues. On the accuracy level, the improvements are less obvious. The target sentence does not always contain traces or clues of content being missing (omissions). This has repercussions for quality estimation or gisting operating only on the monolingual level. Mistranslations are part of another well-represented error category, comprising a high number of word-sense disambiguation errors and a variety of other mistranslation errors, making it more complex to annotate or post-edit.
Keywords
LT3

Downloads

  • LREC A FineGrained Error Analysis of NMT PBMT and RBMT Output for English-to-Dutch LVB.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 391.92 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Van Brussel, Laura, Arda Tezcan, and Lieve Macken. 2018. “A Fine-grained Error Analysis of NMT, PBMT and RBMT Output for English-to-Dutch.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, ed. Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, et al., 3799–3804. Miyazaki, Japan: European Language Resources Association (ELRA).
APA
Van Brussel, L., Tezcan, A., & Macken, L. (2018). A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, et al. (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (pp. 3799–3804). Presented at the Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan: European Language Resources Association (ELRA).
Vancouver
1.
Van Brussel L, Tezcan A, Macken L. A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch. In: Calzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, et al., editors. Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki, Japan: European Language Resources Association (ELRA); 2018. p. 3799–804.
MLA
Van Brussel, Laura, Arda Tezcan, and Lieve Macken. “A Fine-grained Error Analysis of NMT, PBMT and RBMT Output for English-to-Dutch.” Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Ed. Nicoletta Calzolari et al. Miyazaki, Japan: European Language Resources Association (ELRA), 2018. 3799–3804. Print.
@inproceedings{8561558,
  abstract     = {This paper presents a fine-grained error comparison of the English-to-Dutch translations of a commercial neural, phrase-based and rule-based machine translation (MT) system. For phrase-based and rule-based machine translation, we make use of the annotated SCATE corpus of MT errors, enriching it with the annotation of neural MT errors and updating the SCATE error taxonomy to fit the neural MT output as well. Neural, in general, outperforms phrase-based and rule-based systems especially for fluency, except for lexical issues. On the accuracy level, the improvements are less obvious. The target sentence does not always contain traces or clues of content being missing (omissions). This has repercussions for quality estimation or gisting operating only on the monolingual level. Mistranslations are part of another well-represented error category, comprising a high number of word-sense disambiguation errors and a variety of other mistranslation errors, making it more complex to annotate or post-edit. },
  author       = {Van Brussel, Laura and Tezcan, Arda and Macken, Lieve},
  booktitle    = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation},
  editor       = {Calzolari, Nicoletta and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Hasida, Koiti and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph  and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios and Tokunaga, Takenobu},
  isbn         = {979-10-95546-00-9},
  keyword      = {LT3},
  language     = {eng},
  location     = {Miyazaki, Japan},
  pages        = {3799--3804},
  publisher    = {European Language Resources Association (ELRA)},
  title        = {A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch},
  year         = {2018},
}