Advanced search
1 file | 2.96 MB Add to list

Integrating fuzzy matches into sentence-level quality estimation for neural machine translation

Arda Tezcan (UGent)
Author
Organization
Abstract
Previous studies show that neural machine translation (NMT) systems produce translations with higher quality when highly similar sentences (i.e. fuzzy matches; FMs) to a given input sentence can be found in the NMT training data. This study explores the usefulness of FMs for the task of sentence-level quality estimation (QE) for NMT. To this end, fuzzy matches are integrated into the QE architecture that utilizes a pre_trained XLM RoBERTa model, through a data augmentation methodology. The results show that FMs improve QE performance in domainspecific scenarios when using translation edit rate (TER) as quality labels. However, similar improvements are not observed when the same methodology is applied to a general-domain setting when quality labels were generated through direct (manual) assessment of translation quality or by measuring the technical post-editing effort required for transforming the MT output to its post-edited version.
Keywords
quality estimation, machine translation, fuzzy matching, large language models

Downloads

  • 08 Tezcan integrating.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 2.96 MB

Citation

Please use this url to cite or link to this publication:

MLA
Tezcan, Arda. “Integrating Fuzzy Matches into Sentence-Level Quality Estimation for Neural Machine Translation.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 12, 2022, pp. 99–123.
APA
Tezcan, A. (2022). Integrating fuzzy matches into sentence-level quality estimation for neural machine translation. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, 12, 99–123.
Chicago author-date
Tezcan, Arda. 2022. “Integrating Fuzzy Matches into Sentence-Level Quality Estimation for Neural Machine Translation.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 12: 99–123.
Chicago author-date (all authors)
Tezcan, Arda. 2022. “Integrating Fuzzy Matches into Sentence-Level Quality Estimation for Neural Machine Translation.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 12: 99–123.
Vancouver
1.
Tezcan A. Integrating fuzzy matches into sentence-level quality estimation for neural machine translation. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL. 2022;12:99–123.
IEEE
[1]
A. Tezcan, “Integrating fuzzy matches into sentence-level quality estimation for neural machine translation,” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 12, pp. 99–123, 2022.
@article{01H1V1MMZ86HQZ57AXKDHMEHCP,
  abstract     = {{Previous studies show that neural machine translation (NMT) systems produce translations with higher quality when highly similar sentences (i.e. fuzzy matches; FMs) to a given input sentence can be found in the NMT training data. This study explores the usefulness of FMs for the task of sentence-level quality estimation (QE) for NMT. To this end, fuzzy matches are integrated into the QE architecture that utilizes a pre_trained XLM RoBERTa model, through a data augmentation methodology. The results show that FMs improve QE performance in domainspecific scenarios when using translation edit rate (TER) as quality labels. However, similar improvements are not observed when the same methodology is applied to a general-domain setting when quality labels were generated through direct (manual) assessment of translation quality or by measuring the technical post-editing effort required for transforming the MT output to its post-edited version.}},
  author       = {{Tezcan, Arda}},
  issn         = {{2211-4009}},
  journal      = {{COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL}},
  keywords     = {{quality estimation,machine translation,fuzzy matching,large language models}},
  language     = {{eng}},
  pages        = {{99--123}},
  title        = {{Integrating fuzzy matches into sentence-level quality estimation for neural machine translation}},
  url          = {{https://clinjournal.org/clinj/article/view/150}},
  volume       = {{12}},
  year         = {{2022}},
}