Advanced search
1 file | 707.97 KB Add to list

Tailoring machine translation for scientific literature through topic filtering and fuzzy match augmentation

Author
Organization
Project
Abstract
To enhance the accessibility of scientific literature in multiple languages and facilitate the exchange of information among scholars and a wider audience, there is a need for high-performing specialized machine translation (MT) engines. However, this requires efficient filtering and the use of domain-specific data. In this study, we investigate whether approaches for increasing training data using topic filtering and more efficient use of such data through exploiting fuzzy matches (i.e. similar translations to a given input; FMs) improve translation quality. We apply these techniques both to sequence-to-sequence MT models and off-the-shelf multilingual large language models (LLMs) in three scientific disciplines. Our results suggest that the combination of topic filtering and FM augmentation is an effective strategy for training neural machine translation (NMT) models from scratch, not only surpassing baseline NMT models but also delivering improved translation performance compared to smaller LLMs in terms of the number of parameters. Furthermore, we find that although FM augmentation through in-context learning generally improves LLM translation performance, limited domain-specific datasets can yield results comparable to those achieved with additional multi-domain datasets.
Keywords
Machine Translation, Artificial Intelligence, Computational Linguistics

Downloads

  • publisher version.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 707.97 KB

Citation

Please use this url to cite or link to this publication:

MLA
Moerman, Thomas, et al. “Tailoring Machine Translation for Scientific Literature through Topic Filtering and Fuzzy Match Augmentation.” Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025), edited by Takashi Tsunakawa et al., European Association for Machine Translation (EAMT), 2025, pp. 13–26.
APA
Moerman, T., Vanallemeersch, T., Szoc, S., & Tezcan, A. (2025). Tailoring machine translation for scientific literature through topic filtering and fuzzy match augmentation. In T. Tsunakawa, K. Sudoh, & I. Goto (Eds.), Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025) (pp. 13–26). European Association for Machine Translation (EAMT).
Chicago author-date
Moerman, Thomas, Tom Vanallemeersch, Sara Szoc, and Arda Tezcan. 2025. “Tailoring Machine Translation for Scientific Literature through Topic Filtering and Fuzzy Match Augmentation.” In Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025), edited by Takashi Tsunakawa, Katsuhito Sudoh, and Isao Goto, 13–26. European Association for Machine Translation (EAMT).
Chicago author-date (all authors)
Moerman, Thomas, Tom Vanallemeersch, Sara Szoc, and Arda Tezcan. 2025. “Tailoring Machine Translation for Scientific Literature through Topic Filtering and Fuzzy Match Augmentation.” In Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025), ed by. Takashi Tsunakawa, Katsuhito Sudoh, and Isao Goto, 13–26. European Association for Machine Translation (EAMT).
Vancouver
1.
Moerman T, Vanallemeersch T, Szoc S, Tezcan A. Tailoring machine translation for scientific literature through topic filtering and fuzzy match augmentation. In: Tsunakawa T, Sudoh K, Goto I, editors. Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025). European Association for Machine Translation (EAMT); 2025. p. 13–26.
IEEE
[1]
T. Moerman, T. Vanallemeersch, S. Szoc, and A. Tezcan, “Tailoring machine translation for scientific literature through topic filtering and fuzzy match augmentation,” in Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025), Geneva, Switzerland, 2025, pp. 13–26.
@inproceedings{01K3DXBJHN8C2WMRCK39W0S7X8,
  abstract     = {{To enhance the accessibility of scientific literature in multiple languages and facilitate the exchange of information among scholars and a wider audience, there is a need for high-performing specialized machine translation (MT) engines. However, this requires efficient filtering and the use of domain-specific data. In this study, we investigate whether approaches for increasing training data using topic filtering and more efficient use of such data through exploiting fuzzy matches (i.e. similar translations to a given input; FMs) improve translation quality. We apply these techniques both to sequence-to-sequence MT models and off-the-shelf multilingual large language models (LLMs) in three scientific disciplines. Our results suggest that the combination of topic filtering and FM augmentation is an effective strategy for training neural machine translation (NMT) models from scratch, not only surpassing baseline NMT models but also delivering improved translation performance compared to smaller LLMs in terms of the number of parameters. Furthermore, we find that although FM augmentation through in-context learning generally improves LLM translation performance, limited domain-specific datasets can yield results comparable to those achieved with additional multi-domain datasets.}},
  author       = {{Moerman, Thomas and Vanallemeersch, Tom and Szoc, Sara and Tezcan, Arda}},
  booktitle    = {{Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025)}},
  editor       = {{Tsunakawa, Takashi and Sudoh, Katsuhito and Goto, Isao}},
  isbn         = {{9782970189725}},
  keywords     = {{Machine Translation,Artificial Intelligence,Computational Linguistics}},
  language     = {{eng}},
  location     = {{Geneva, Switzerland}},
  pages        = {{13--26}},
  publisher    = {{European Association for Machine Translation (EAMT)}},
  title        = {{Tailoring machine translation for scientific literature through topic filtering and fuzzy match augmentation}},
  url          = {{https://aclanthology.org/2025.pslt-1.2/}},
  year         = {{2025}},
}