
Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries
- Author
- Pranaydeep Singh (UGent) , Ayla Rigouts Terryn and Els Lefever (UGent)
- Organization
- Abstract
- This paper reports on a set of proof-of-concept experiments performed to evaluate and improve the alignment of monolingual embeddings for a specialised domain, viz. the medical use case of heart failure. The presented approach, which creates domain-specific dictionaries on-the-fly from cross-lingual Wikipedia links, achieves good results for cross-lingual alignment of this specialised vocabulary in three language pairs: English-Dutch, English-French, and Dutch-French. The experimental results show that the setup incorporating a smaller but dedicated domain-specific dictionary outperforms the alignment incorporating a larger but general-domain seed dictionary. A detailed error analysis reveals that many potentially useful (near-)equivalents are found beyond those present in the gold standard, and it inspires strategies for further improvements, such as lemmatisation and improved tokenisation.
- Keywords
- LT3
Downloads
-
2022 CLIN Singh.pdf
- full text (Published version)
- |
- open access
- |
- |
- 1.48 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01GQ0MNB9V8BWFFYBVVMK85C11
- MLA
- Singh, Pranaydeep, et al. “Improving Domain-Specific Cross-Lingual Embeddings with Automatically Generated Bilingual Dictionaries.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 12, 2022, pp. 125–40.
- APA
- Singh, P., Rigouts Terryn, A., & Lefever, E. (2022). Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, 12, 125–140.
- Chicago author-date
- Singh, Pranaydeep, Ayla Rigouts Terryn, and Els Lefever. 2022. “Improving Domain-Specific Cross-Lingual Embeddings with Automatically Generated Bilingual Dictionaries.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 12: 125–40.
- Chicago author-date (all authors)
- Singh, Pranaydeep, Ayla Rigouts Terryn, and Els Lefever. 2022. “Improving Domain-Specific Cross-Lingual Embeddings with Automatically Generated Bilingual Dictionaries.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 12: 125–140.
- Vancouver
- 1.Singh P, Rigouts Terryn A, Lefever E. Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL. 2022;12:125–40.
- IEEE
- [1]P. Singh, A. Rigouts Terryn, and E. Lefever, “Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries,” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 12, pp. 125–140, 2022.
@article{01GQ0MNB9V8BWFFYBVVMK85C11, abstract = {{This paper reports on a set of proof-of-concept experiments performed to evaluate and improve the alignment of monolingual embeddings for a specialised domain, viz. the medical use case of heart failure. The presented approach, which creates domain-specific dictionaries on-the-fly from cross-lingual Wikipedia links, achieves good results for cross-lingual alignment of this specialised vocabulary in three language pairs: English-Dutch, English-French, and Dutch-French. The experimental results show that the setup incorporating a smaller but dedicated domain-specific dictionary outperforms the alignment incorporating a larger but general-domain seed dictionary. A detailed error analysis reveals that many potentially useful (near-)equivalents are found beyond those present in the gold standard, and it inspires strategies for further improvements, such as lemmatisation and improved tokenisation.}}, author = {{Singh, Pranaydeep and Rigouts Terryn, Ayla and Lefever, Els}}, issn = {{2211-4009}}, journal = {{COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL}}, keywords = {{LT3}}, language = {{eng}}, pages = {{125--140}}, title = {{Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries}}, url = {{https://www.clinjournal.org/clinj/article/view/151}}, volume = {{12}}, year = {{2022}}, }