Integrating fuzzy matches into sentence-level quality estimation for neural machine translation
- Author
- Arda Tezcan (UGent)
- Organization
- Abstract
- Previous studies show that neural machine translation (NMT) systems produce translations with higher quality when highly similar sentences (i.e. fuzzy matches; FMs) to a given input sentence can be found in the NMT training data. This study explores the usefulness of FMs for the task of sentence-level quality estimation (QE) for NMT. To this end, fuzzy matches are integrated into the QE architecture that utilizes a pre_trained XLM RoBERTa model, through a data augmentation methodology. The results show that FMs improve QE performance in domainspecific scenarios when using translation edit rate (TER) as quality labels. However, similar improvements are not observed when the same methodology is applied to a general-domain setting when quality labels were generated through direct (manual) assessment of translation quality or by measuring the technical post-editing effort required for transforming the MT output to its post-edited version.
- Keywords
- quality estimation, machine translation, fuzzy matching, large language models
Downloads
-
08 Tezcan integrating.pdf
- full text (Published version)
- |
- open access
- |
- |
- 2.96 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01H1V1MMZ86HQZ57AXKDHMEHCP
- MLA
- Tezcan, Arda. “Integrating Fuzzy Matches into Sentence-Level Quality Estimation for Neural Machine Translation.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 12, 2022, pp. 99–123.
- APA
- Tezcan, A. (2022). Integrating fuzzy matches into sentence-level quality estimation for neural machine translation. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, 12, 99–123.
- Chicago author-date
- Tezcan, Arda. 2022. “Integrating Fuzzy Matches into Sentence-Level Quality Estimation for Neural Machine Translation.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 12: 99–123.
- Chicago author-date (all authors)
- Tezcan, Arda. 2022. “Integrating Fuzzy Matches into Sentence-Level Quality Estimation for Neural Machine Translation.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 12: 99–123.
- Vancouver
- 1.Tezcan A. Integrating fuzzy matches into sentence-level quality estimation for neural machine translation. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL. 2022;12:99–123.
- IEEE
- [1]A. Tezcan, “Integrating fuzzy matches into sentence-level quality estimation for neural machine translation,” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 12, pp. 99–123, 2022.
@article{01H1V1MMZ86HQZ57AXKDHMEHCP, abstract = {{Previous studies show that neural machine translation (NMT) systems produce translations with higher quality when highly similar sentences (i.e. fuzzy matches; FMs) to a given input sentence can be found in the NMT training data. This study explores the usefulness of FMs for the task of sentence-level quality estimation (QE) for NMT. To this end, fuzzy matches are integrated into the QE architecture that utilizes a pre_trained XLM RoBERTa model, through a data augmentation methodology. The results show that FMs improve QE performance in domainspecific scenarios when using translation edit rate (TER) as quality labels. However, similar improvements are not observed when the same methodology is applied to a general-domain setting when quality labels were generated through direct (manual) assessment of translation quality or by measuring the technical post-editing effort required for transforming the MT output to its post-edited version.}}, author = {{Tezcan, Arda}}, issn = {{2211-4009}}, journal = {{COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL}}, keywords = {{quality estimation,machine translation,fuzzy matching,large language models}}, language = {{eng}}, pages = {{99--123}}, title = {{Integrating fuzzy matches into sentence-level quality estimation for neural machine translation}}, url = {{https://clinjournal.org/clinj/article/view/150}}, volume = {{12}}, year = {{2022}}, }