Towards a better integration of fuzzy matches in neural machine translation through data augmentation
- Author
- Arda Tezcan (UGent) , Bram Bulté and Bram Vanroy (UGent)
- Organization
- Abstract
- We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations.
- Keywords
- translation memories, data augmentation, fuzzy matching, NMT, sub-word units, lt3
Downloads
-
Tezcan et al - 2021 - Towards a better integration of fuzzy matches in neural machine translation through data augmentation.pdf
- full text (Published version)
- |
- open access
- |
- |
- 760.51 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8690842
- MLA
- Tezcan, Arda, et al. “Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation.” INFORMATICS-BASEL, vol. 8, no. 1, 2021, doi:10.3390/informatics8010007.
- APA
- Tezcan, A., Bulté, B., & Vanroy, B. (2021). Towards a better integration of fuzzy matches in neural machine translation through data augmentation. INFORMATICS-BASEL, 8(1). https://doi.org/10.3390/informatics8010007
- Chicago author-date
- Tezcan, Arda, Bram Bulté, and Bram Vanroy. 2021. “Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation.” INFORMATICS-BASEL 8 (1). https://doi.org/10.3390/informatics8010007.
- Chicago author-date (all authors)
- Tezcan, Arda, Bram Bulté, and Bram Vanroy. 2021. “Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation.” INFORMATICS-BASEL 8 (1). doi:10.3390/informatics8010007.
- Vancouver
- 1.Tezcan A, Bulté B, Vanroy B. Towards a better integration of fuzzy matches in neural machine translation through data augmentation. INFORMATICS-BASEL. 2021;8(1).
- IEEE
- [1]A. Tezcan, B. Bulté, and B. Vanroy, “Towards a better integration of fuzzy matches in neural machine translation through data augmentation,” INFORMATICS-BASEL, vol. 8, no. 1, 2021.
@article{8690842, abstract = {{We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations.}}, articleno = {{7}}, author = {{Tezcan, Arda and Bulté, Bram and Vanroy, Bram}}, issn = {{2227-9709}}, journal = {{INFORMATICS-BASEL}}, keywords = {{translation memories,data augmentation,fuzzy matching,NMT,sub-word units,lt3}}, language = {{eng}}, number = {{1}}, pages = {{27}}, title = {{Towards a better integration of fuzzy matches in neural machine translation through data augmentation}}, url = {{http://doi.org/10.3390/informatics8010007}}, volume = {{8}}, year = {{2021}}, }
- Altmetric
- View in Altmetric