Advanced search
1 file | 677.31 KB Add to list

Investigating the quality of static anchor embeddings from transformers for under-resourced languages

Pranaydeep Singh (UGent) , Orphée De Clercq (UGent) and Els Lefever (UGent)
Author
Organization
Abstract
This paper reports on experiments for cross-lingual transfer using the anchor-based approach of Schuster et al. (2019) for English and a low-resourced language, namely Hindi. For the sake of comparison, we also evaluate the approach on three very different higher-resourced languages, viz. Dutch, Russian and Chinese. Initially designed for ELMo embeddings, we analyze the approach for the more recent BERT family of transformers for a variety of tasks, both mono and cross-lingual. The results largely prove that like most other cross-lingual transfer approaches, the static anchor approach is underwhelming for the low-resource language, while performing adequately for the higher resourced ones. We attempt to provide insights into both the quality of the anchors, and the performance for low-shot cross-lingual transfer to better understand this performance gap. We make the extracted anchors and the modified train and test sets available for future research at https://github.com/pranaydeeps/Vyaapak
Keywords
LT3

Downloads

  • 2022.sigul-1.23.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 677.31 KB

Citation

Please use this url to cite or link to this publication:

MLA
Singh, Pranaydeep, et al. “Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages.” Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, edited by Maite Melero et al., European Language Resources Association (ELRA), 2022, pp. 176–84.
APA
Singh, P., De Clercq, O., & Lefever, E. (2022). Investigating the quality of static anchor embeddings from transformers for under-resourced languages. In M. Melero, S. Sakti, & C. Soria (Eds.), Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (pp. 176–184). Marseille, France: European Language Resources Association (ELRA).
Chicago author-date
Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. 2022. “Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages.” In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, edited by Maite Melero, Sakriani Sakti, and Claudia Soria, 176–84. Marseille, France: European Language Resources Association (ELRA).
Chicago author-date (all authors)
Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. 2022. “Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages.” In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, ed by. Maite Melero, Sakriani Sakti, and Claudia Soria, 176–184. Marseille, France: European Language Resources Association (ELRA).
Vancouver
1.
Singh P, De Clercq O, Lefever E. Investigating the quality of static anchor embeddings from transformers for under-resourced languages. In: Melero M, Sakti S, Soria C, editors. Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. Marseille, France: European Language Resources Association (ELRA); 2022. p. 176–84.
IEEE
[1]
P. Singh, O. De Clercq, and E. Lefever, “Investigating the quality of static anchor embeddings from transformers for under-resourced languages,” in Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, Marseille, France, 2022, pp. 176–184.
@inproceedings{8759657,
  abstract     = {{This paper reports on experiments for cross-lingual transfer using the anchor-based approach of Schuster et al. (2019) for English and a low-resourced language, namely Hindi. For the sake of comparison, we also evaluate the approach on three very different higher-resourced languages, viz. Dutch, Russian and Chinese. Initially designed for ELMo embeddings, we analyze the approach for the more recent BERT family of transformers for a variety of tasks, both mono and cross-lingual. The results largely prove that like most other cross-lingual transfer approaches, the static anchor approach is underwhelming for the low-resource language, while performing adequately for the higher resourced ones. We attempt to provide insights into both the quality of the anchors, and the performance for low-shot cross-lingual transfer to better understand this performance gap. We make the extracted anchors and the modified train and test sets available for future research at https://github.com/pranaydeeps/Vyaapak}},
  author       = {{Singh, Pranaydeep and De Clercq, Orphée and Lefever, Els}},
  booktitle    = {{Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages}},
  editor       = {{Melero, Maite and Sakti, Sakriani and Soria, Claudia}},
  isbn         = {{9791095546917}},
  keywords     = {{LT3}},
  language     = {{eng}},
  location     = {{Marseille, France}},
  pages        = {{176--184}},
  publisher    = {{European Language Resources Association (ELRA)}},
  title        = {{Investigating the quality of static anchor embeddings from transformers for under-resourced languages}},
  url          = {{http://www.lrec-conf.org/proceedings/lrec2022/workshops/SIGUL/index.html}},
  year         = {{2022}},
}