Advanced search
1 file | 592.25 KB Add to list

Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data

Author
Organization
Project
Abstract
Previous studies have demonstrated the effectiveness of fuzzy match (FM) augmentation in improving the performance of Neural Machine Translation (NMT) models. However, this approach exhibits limitations when applied to scenarios where limited parallel datasets are available for NMT training. This study investigates the effectiveness of leveraging additional monolingual data to improve FM-augmented NMT performance by generating synthetic parallel datasets in domain-specific scenarios. To this end, we adopt a simple strategy for combining two data augmentation methods for NMT, namely back-translation and Neural Fuzzy Repair (NFR). Experiments conducted on three language directions, namely English→Ukrainian, English→French and French→English, two domains and various dataset sizes show that this simple approach yields significant and substantial improvements in estimated translation quality.
Keywords
machine translation, fuzzy match augmentation, lt3

Downloads

  • art-tezcan-skidanova-moerman.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 592.25 KB

Citation

Please use this url to cite or link to this publication:

MLA
Tezcan, Arda, et al. “Improving Fuzzy Match Augmented Neural Machine Translation in Specialised Domains through Synthetic Data.” PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS, no. 122, 2024, pp. 9–42, doi:10.14712/00326585.030.
APA
Tezcan, A., Skidanova, A., & Moerman, T. (2024). Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data. PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS, (122), 9–42. https://doi.org/10.14712/00326585.030
Chicago author-date
Tezcan, Arda, Alina Skidanova, and Thomas Moerman. 2024. “Improving Fuzzy Match Augmented Neural Machine Translation in Specialised Domains through Synthetic Data.” PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS, no. 122: 9–42. https://doi.org/10.14712/00326585.030.
Chicago author-date (all authors)
Tezcan, Arda, Alina Skidanova, and Thomas Moerman. 2024. “Improving Fuzzy Match Augmented Neural Machine Translation in Specialised Domains through Synthetic Data.” PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS (122): 9–42. doi:10.14712/00326585.030.
Vancouver
1.
Tezcan A, Skidanova A, Moerman T. Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data. PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS. 2024;(122):9–42.
IEEE
[1]
A. Tezcan, A. Skidanova, and T. Moerman, “Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data,” PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS, no. 122, pp. 9–42, 2024.
@article{01JH03BTMC46HDCFMG2PJMV7AW,
  abstract     = {{Previous studies have demonstrated the effectiveness of fuzzy match (FM) augmentation in improving the performance of Neural Machine Translation (NMT) models. However, this approach exhibits limitations when applied to scenarios where limited parallel datasets are available for NMT training. This study investigates the effectiveness of leveraging additional monolingual data to improve FM-augmented NMT performance by generating synthetic parallel datasets in domain-specific scenarios. To this end, we adopt a simple strategy for combining two data augmentation methods for NMT, namely back-translation and Neural Fuzzy Repair (NFR). Experiments conducted on three language directions, namely English→Ukrainian, English→French and French→English, two domains and various dataset sizes show that this simple approach yields significant and substantial improvements in estimated translation quality.}},
  author       = {{Tezcan, Arda and Skidanova, Alina and Moerman, Thomas}},
  issn         = {{0032-6585}},
  journal      = {{PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS}},
  keywords     = {{machine translation,fuzzy match augmentation,lt3}},
  language     = {{eng}},
  number       = {{122}},
  pages        = {{9--42}},
  title        = {{Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data}},
  url          = {{http://doi.org/10.14712/00326585.030}},
  year         = {{2024}},
}

Altmetric
View in Altmetric