Advanced search
1 file | 1.48 MB Add to list

Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries

Author
Organization
Abstract
This paper reports on a set of proof-of-concept experiments performed to evaluate and improve the alignment of monolingual embeddings for a specialised domain, viz. the medical use case of heart failure. The presented approach, which creates domain-specific dictionaries on-the-fly from cross-lingual Wikipedia links, achieves good results for cross-lingual alignment of this specialised vocabulary in three language pairs: English-Dutch, English-French, and Dutch-French. The experimental results show that the setup incorporating a smaller but dedicated domain-specific dictionary outperforms the alignment incorporating a larger but general-domain seed dictionary. A detailed error analysis reveals that many potentially useful (near-)equivalents are found beyond those present in the gold standard, and it inspires strategies for further improvements, such as lemmatisation and improved tokenisation.
Keywords
LT3

Downloads

  • 2022 CLIN Singh.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.48 MB

Citation

Please use this url to cite or link to this publication:

MLA
Singh, Pranaydeep, et al. “Improving Domain-Specific Cross-Lingual Embeddings with Automatically Generated Bilingual Dictionaries.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 12, 2022, pp. 125–40.
APA
Singh, P., Rigouts Terryn, A., & Lefever, E. (2022). Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, 12, 125–140.
Chicago author-date
Singh, Pranaydeep, Ayla Rigouts Terryn, and Els Lefever. 2022. “Improving Domain-Specific Cross-Lingual Embeddings with Automatically Generated Bilingual Dictionaries.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 12: 125–40.
Chicago author-date (all authors)
Singh, Pranaydeep, Ayla Rigouts Terryn, and Els Lefever. 2022. “Improving Domain-Specific Cross-Lingual Embeddings with Automatically Generated Bilingual Dictionaries.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 12: 125–140.
Vancouver
1.
Singh P, Rigouts Terryn A, Lefever E. Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL. 2022;12:125–40.
IEEE
[1]
P. Singh, A. Rigouts Terryn, and E. Lefever, “Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries,” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 12, pp. 125–140, 2022.
@article{01GQ0MNB9V8BWFFYBVVMK85C11,
  abstract     = {{This paper reports on a set of proof-of-concept experiments performed to evaluate and improve the alignment of monolingual embeddings for a specialised domain, viz. the medical use case of heart failure. The presented approach, which creates domain-specific dictionaries on-the-fly from cross-lingual Wikipedia links, achieves good results for cross-lingual alignment of this specialised vocabulary in three language pairs: English-Dutch, English-French, and Dutch-French. The experimental results show that the setup incorporating a smaller but dedicated domain-specific dictionary outperforms the alignment incorporating a larger but general-domain seed dictionary. A detailed error analysis reveals that many potentially useful (near-)equivalents are found beyond those present in the gold standard, and it inspires strategies for further improvements, such as lemmatisation and improved tokenisation.}},
  author       = {{Singh, Pranaydeep and Rigouts Terryn, Ayla and Lefever, Els}},
  issn         = {{2211-4009}},
  journal      = {{COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL}},
  keywords     = {{LT3}},
  language     = {{eng}},
  pages        = {{125--140}},
  title        = {{Improving domain-specific cross-lingual embeddings with automatically generated bilingual dictionaries}},
  url          = {{https://www.clinjournal.org/clinj/article/view/151}},
  volume       = {{12}},
  year         = {{2022}},
}