
SCATE taxonomy and corpus of machine translation errors
- Author
- Arda Tezcan (UGent) , Veronique Hoste (UGent) and Lieve Macken (UGent)
- Organization
- Abstract
- Quality Estimation (QE) and error analysis of Machine Translation (MT) output remain active areas in Natural Language Processing (NLP) research. Many recent efforts have focused on Machine Learning (ML) systems to estimate the MT quality, translation errors, post-editing speed or post-editing effort. As the accuracy of such ml tasks relies on the availability of corpora, there is an increasing need for large corpora of machine translations annotated with translation errors and the error annotation guidelines to produce consistent annotations. Drawing on previous work on translation error taxonomies, we present the SCATE (Smart Computer-aided Translation Environment) mt error taxonomy, which is hierarchical in nature and is based upon the familiar notions of accuracy and fluency. In the scate annotation framework, we annotate fluency errors in the target text and accuracy errors in both the source and target text, while linking the source and target annotations. We also propose a novel method for alignment-based Inter-Annotator Agreement (IAA) analysis and show that this method can be used effectively on large annotation sets. Using the scate taxonomy and guidelines, we create the first corpus of MT errors for the English-Dutch language pair, consisting of Statistical Machine Translation (SMT) and Rule-Based Machine Translation (RBMT) errors, which is a valuable resource not only for NLP tasks in this field but also to study the relationship between mt errors and post-editing efforts in the future. Finally, we analyse the error profiles of the smt and the rbmt systems used in this study and compare the quality of these two different mt architectures based on the error types.
- Keywords
- LT3, Machine translation, Quality estimation, Post-editing, Machine learning, Feature selection
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 827.81 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8544345
- MLA
- Tezcan, Arda, et al. “SCATE Taxonomy and Corpus of Machine Translation Errors.” Trends in E-Tools and Resources for Translators and Interpreters, edited by Gloria Corpas Pastor and Isabel Durán-Muñoz, vol. 45, Brill | Rodopi, 2017, pp. 219–44, doi:10.1163/9789004351790_012.
- APA
- Tezcan, A., Hoste, V., & Macken, L. (2017). SCATE taxonomy and corpus of machine translation errors. In G. C. Pastor & I. Durán-Muñoz (Eds.), Trends in E-tools and resources for translators and interpreters (Vol. 45, pp. 219–244). https://doi.org/10.1163/9789004351790_012
- Chicago author-date
- Tezcan, Arda, Veronique Hoste, and Lieve Macken. 2017. “SCATE Taxonomy and Corpus of Machine Translation Errors.” In Trends in E-Tools and Resources for Translators and Interpreters, edited by Gloria Corpas Pastor and Isabel Durán-Muñoz, 45:219–44. Brill | Rodopi. https://doi.org/10.1163/9789004351790_012.
- Chicago author-date (all authors)
- Tezcan, Arda, Veronique Hoste, and Lieve Macken. 2017. “SCATE Taxonomy and Corpus of Machine Translation Errors.” In Trends in E-Tools and Resources for Translators and Interpreters, ed by. Gloria Corpas Pastor and Isabel Durán-Muñoz, 45:219–244. Brill | Rodopi. doi:10.1163/9789004351790_012.
- Vancouver
- 1.Tezcan A, Hoste V, Macken L. SCATE taxonomy and corpus of machine translation errors. In: Pastor GC, Durán-Muñoz I, editors. Trends in E-tools and resources for translators and interpreters. Brill | Rodopi; 2017. p. 219–44.
- IEEE
- [1]A. Tezcan, V. Hoste, and L. Macken, “SCATE taxonomy and corpus of machine translation errors,” in Trends in E-tools and resources for translators and interpreters, vol. 45, G. C. Pastor and I. Durán-Muñoz, Eds. Brill | Rodopi, 2017, pp. 219–244.
@incollection{8544345, abstract = {{Quality Estimation (QE) and error analysis of Machine Translation (MT) output remain active areas in Natural Language Processing (NLP) research. Many recent efforts have focused on Machine Learning (ML) systems to estimate the MT quality, translation errors, post-editing speed or post-editing effort. As the accuracy of such ml tasks relies on the availability of corpora, there is an increasing need for large corpora of machine translations annotated with translation errors and the error annotation guidelines to produce consistent annotations. Drawing on previous work on translation error taxonomies, we present the SCATE (Smart Computer-aided Translation Environment) mt error taxonomy, which is hierarchical in nature and is based upon the familiar notions of accuracy and fluency. In the scate annotation framework, we annotate fluency errors in the target text and accuracy errors in both the source and target text, while linking the source and target annotations. We also propose a novel method for alignment-based Inter-Annotator Agreement (IAA) analysis and show that this method can be used effectively on large annotation sets. Using the scate taxonomy and guidelines, we create the first corpus of MT errors for the English-Dutch language pair, consisting of Statistical Machine Translation (SMT) and Rule-Based Machine Translation (RBMT) errors, which is a valuable resource not only for NLP tasks in this field but also to study the relationship between mt errors and post-editing efforts in the future. Finally, we analyse the error profiles of the smt and the rbmt systems used in this study and compare the quality of these two different mt architectures based on the error types.}}, author = {{Tezcan, Arda and Hoste, Veronique and Macken, Lieve}}, booktitle = {{Trends in E-tools and resources for translators and interpreters}}, editor = {{Pastor, Gloria Corpas and Durán-Muñoz, Isabel}}, isbn = {{9789004351790}}, keywords = {{LT3,Machine translation,Quality estimation,Post-editing,Machine learning,Feature selection}}, language = {{eng}}, pages = {{219--244}}, publisher = {{Brill | Rodopi}}, series = {{Approaches to Translation Studies}}, title = {{SCATE taxonomy and corpus of machine translation errors}}, url = {{http://doi.org/10.1163/9789004351790_012}}, volume = {{45}}, year = {{2017}}, }
- Altmetric
- View in Altmetric