Ghent University Academic Bibliography

Advanced

A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch

Laura Van Brussel UGent, Arda Tezcan UGent and Lieve Macken UGent (2018) Proceedings of the Eleventh International Conference on Language Resources and Evaluation. p.3799-3804
abstract
This paper presents a fine-grained error comparison of the English-to-Dutch translations of a commercial neural, phrase-based and rule-based machine translation (MT) system. For phrase-based and rule-based machine translation, we make use of the annotated SCATE corpus of MT errors, enriching it with the annotation of neural MT errors and updating the SCATE error taxonomy to fit the neural MT output as well. Neural, in general, outperforms phrase-based and rule-based systems especially for fluency, except for lexical issues. On the accuracy level, the improvements are less obvious. The target sentence does not always contain traces or clues of content being missing (omissions). This has repercussions for quality estimation or gisting operating only on the monolingual level. Mistranslations are part of another well-represented error category, comprising a high number of word-sense disambiguation errors and a variety of other mistranslation errors, making it more complex to annotate or post-edit.
Please use this url to cite or link to this publication:
author
organization
year
type
conference (other)
publication status
published
subject
keyword
LT3
in
Proceedings of the Eleventh International Conference on Language Resources and Evaluation
LREC 2018
editor
Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis and Takenobu Tokunaga
pages
6 pages
publisher
European Language Resources Association (ELRA)
place of publication
Miyazaki, Japan
conference name
Eleventh International Conference on Language Resources and Evaluation
conference organizer
ELRA
conference location
Miyazaki, Japan
conference start
2018-05-07
conference end
2018-05-12
ISBN
979-10-95546-00-9
language
English
UGent publication?
yes
classification
C1
copyright statement
I don't know the status of the copyright for this publication
id
8561558
handle
http://hdl.handle.net/1854/LU-8561558
date created
2018-05-14 08:44:16
date last changed
2018-07-09 13:02:41
@inproceedings{8561558,
  abstract     = {This paper presents a fine-grained error comparison of the English-to-Dutch translations of a commercial neural, phrase-based and rule-based machine translation (MT) system. For phrase-based and rule-based machine translation, we make use of the annotated SCATE corpus of MT errors, enriching it with the annotation of neural MT errors and updating the SCATE error taxonomy to fit the neural MT output as well. Neural, in general, outperforms phrase-based and rule-based systems especially for fluency, except for lexical issues. On the accuracy level, the improvements are less obvious. The target sentence does not always contain traces or clues of content being missing (omissions). This has repercussions for quality estimation or gisting operating only on the monolingual level. Mistranslations are part of another well-represented error category, comprising a high number of word-sense disambiguation errors and a variety of other mistranslation errors, making it more complex to annotate or post-edit. },
  author       = {Van Brussel, Laura and Tezcan, Arda and Macken, Lieve},
  booktitle    = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation},
  editor       = {Calzolari, Nicoletta and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Hasida, Koiti and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph  and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios and Tokunaga, Takenobu},
  isbn         = {979-10-95546-00-9},
  keyword      = {LT3},
  language     = {eng},
  location     = {Miyazaki, Japan},
  pages        = {3799--3804},
  publisher    = {European Language Resources Association (ELRA)},
  title        = {A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch},
  year         = {2018},
}

Chicago
Van Brussel, Laura, Arda Tezcan, and Lieve Macken. 2018. “A Fine-grained Error Analysis of NMT, PBMT and RBMT Output for English-to-Dutch.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, ed. Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, et al., 3799–3804. Miyazaki, Japan: European Language Resources Association (ELRA).
APA
Van Brussel, L., Tezcan, A., & Macken, L. (2018). A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, et al. (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (pp. 3799–3804). Presented at the Eleventh International Conference on Language Resources and Evaluation, Miyazaki, Japan: European Language Resources Association (ELRA).
Vancouver
1.
Van Brussel L, Tezcan A, Macken L. A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch. In: Calzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, et al., editors. Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki, Japan: European Language Resources Association (ELRA); 2018. p. 3799–804.
MLA
Van Brussel, Laura, Arda Tezcan, and Lieve Macken. “A Fine-grained Error Analysis of NMT, PBMT and RBMT Output for English-to-Dutch.” Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Ed. Nicoletta Calzolari et al. Miyazaki, Japan: European Language Resources Association (ELRA), 2018. 3799–3804. Print.