
Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data
- Author
- Arda Tezcan (UGent) , Alina Skidanova and Thomas Moerman (UGent)
- Organization
- Project
- Abstract
- Previous studies have demonstrated the effectiveness of fuzzy match (FM) augmentation in improving the performance of Neural Machine Translation (NMT) models. However, this approach exhibits limitations when applied to scenarios where limited parallel datasets are available for NMT training. This study investigates the effectiveness of leveraging additional monolingual data to improve FM-augmented NMT performance by generating synthetic parallel datasets in domain-specific scenarios. To this end, we adopt a simple strategy for combining two data augmentation methods for NMT, namely back-translation and Neural Fuzzy Repair (NFR). Experiments conducted on three language directions, namely English→Ukrainian, English→French and French→English, two domains and various dataset sizes show that this simple approach yields significant and substantial improvements in estimated translation quality.
- Keywords
- machine translation, fuzzy match augmentation, lt3
Downloads
-
art-tezcan-skidanova-moerman.pdf
- full text (Published version)
- |
- open access
- |
- |
- 592.25 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01JH03BTMC46HDCFMG2PJMV7AW
- MLA
- Tezcan, Arda, et al. “Improving Fuzzy Match Augmented Neural Machine Translation in Specialised Domains through Synthetic Data.” PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS, no. 122, 2024, pp. 9–42, doi:10.14712/00326585.030.
- APA
- Tezcan, A., Skidanova, A., & Moerman, T. (2024). Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data. PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS, (122), 9–42. https://doi.org/10.14712/00326585.030
- Chicago author-date
- Tezcan, Arda, Alina Skidanova, and Thomas Moerman. 2024. “Improving Fuzzy Match Augmented Neural Machine Translation in Specialised Domains through Synthetic Data.” PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS, no. 122: 9–42. https://doi.org/10.14712/00326585.030.
- Chicago author-date (all authors)
- Tezcan, Arda, Alina Skidanova, and Thomas Moerman. 2024. “Improving Fuzzy Match Augmented Neural Machine Translation in Specialised Domains through Synthetic Data.” PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS (122): 9–42. doi:10.14712/00326585.030.
- Vancouver
- 1.Tezcan A, Skidanova A, Moerman T. Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data. PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS. 2024;(122):9–42.
- IEEE
- [1]A. Tezcan, A. Skidanova, and T. Moerman, “Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data,” PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS, no. 122, pp. 9–42, 2024.
@article{01JH03BTMC46HDCFMG2PJMV7AW, abstract = {{Previous studies have demonstrated the effectiveness of fuzzy match (FM) augmentation in improving the performance of Neural Machine Translation (NMT) models. However, this approach exhibits limitations when applied to scenarios where limited parallel datasets are available for NMT training. This study investigates the effectiveness of leveraging additional monolingual data to improve FM-augmented NMT performance by generating synthetic parallel datasets in domain-specific scenarios. To this end, we adopt a simple strategy for combining two data augmentation methods for NMT, namely back-translation and Neural Fuzzy Repair (NFR). Experiments conducted on three language directions, namely English→Ukrainian, English→French and French→English, two domains and various dataset sizes show that this simple approach yields significant and substantial improvements in estimated translation quality.}}, author = {{Tezcan, Arda and Skidanova, Alina and Moerman, Thomas}}, issn = {{0032-6585}}, journal = {{PRAGUE BULLETIN OF MATHEMATICAL LINGUISTICS}}, keywords = {{machine translation,fuzzy match augmentation,lt3}}, language = {{eng}}, number = {{122}}, pages = {{9--42}}, title = {{Improving fuzzy match augmented neural machine translation in specialised domains through synthetic data}}, url = {{http://doi.org/10.14712/00326585.030}}, year = {{2024}}, }
- Altmetric
- View in Altmetric