Investigating the quality of static anchor embeddings from transformers for under-resourced languages
- Author
- Pranaydeep Singh (UGent) , Orphée De Clercq (UGent) and Els Lefever (UGent)
- Organization
- Abstract
- This paper reports on experiments for cross-lingual transfer using the anchor-based approach of Schuster et al. (2019) for English and a low-resourced language, namely Hindi. For the sake of comparison, we also evaluate the approach on three very different higher-resourced languages, viz. Dutch, Russian and Chinese. Initially designed for ELMo embeddings, we analyze the approach for the more recent BERT family of transformers for a variety of tasks, both mono and cross-lingual. The results largely prove that like most other cross-lingual transfer approaches, the static anchor approach is underwhelming for the low-resource language, while performing adequately for the higher resourced ones. We attempt to provide insights into both the quality of the anchors, and the performance for low-shot cross-lingual transfer to better understand this performance gap. We make the extracted anchors and the modified train and test sets available for future research at https://github.com/pranaydeeps/Vyaapak
- Keywords
- LT3
Downloads
-
2022.sigul-1.23.pdf
- full text (Published version)
- |
- open access
- |
- |
- 677.31 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8759657
- MLA
- Singh, Pranaydeep, et al. “Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages.” Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, edited by Maite Melero et al., European Language Resources Association (ELRA), 2022, pp. 176–84.
- APA
- Singh, P., De Clercq, O., & Lefever, E. (2022). Investigating the quality of static anchor embeddings from transformers for under-resourced languages. In M. Melero, S. Sakti, & C. Soria (Eds.), Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (pp. 176–184). Marseille, France: European Language Resources Association (ELRA).
- Chicago author-date
- Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. 2022. “Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages.” In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, edited by Maite Melero, Sakriani Sakti, and Claudia Soria, 176–84. Marseille, France: European Language Resources Association (ELRA).
- Chicago author-date (all authors)
- Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. 2022. “Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages.” In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, ed by. Maite Melero, Sakriani Sakti, and Claudia Soria, 176–184. Marseille, France: European Language Resources Association (ELRA).
- Vancouver
- 1.Singh P, De Clercq O, Lefever E. Investigating the quality of static anchor embeddings from transformers for under-resourced languages. In: Melero M, Sakti S, Soria C, editors. Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. Marseille, France: European Language Resources Association (ELRA); 2022. p. 176–84.
- IEEE
- [1]P. Singh, O. De Clercq, and E. Lefever, “Investigating the quality of static anchor embeddings from transformers for under-resourced languages,” in Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, Marseille, France, 2022, pp. 176–184.
@inproceedings{8759657, abstract = {{This paper reports on experiments for cross-lingual transfer using the anchor-based approach of Schuster et al. (2019) for English and a low-resourced language, namely Hindi. For the sake of comparison, we also evaluate the approach on three very different higher-resourced languages, viz. Dutch, Russian and Chinese. Initially designed for ELMo embeddings, we analyze the approach for the more recent BERT family of transformers for a variety of tasks, both mono and cross-lingual. The results largely prove that like most other cross-lingual transfer approaches, the static anchor approach is underwhelming for the low-resource language, while performing adequately for the higher resourced ones. We attempt to provide insights into both the quality of the anchors, and the performance for low-shot cross-lingual transfer to better understand this performance gap. We make the extracted anchors and the modified train and test sets available for future research at https://github.com/pranaydeeps/Vyaapak}}, author = {{Singh, Pranaydeep and De Clercq, Orphée and Lefever, Els}}, booktitle = {{Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages}}, editor = {{Melero, Maite and Sakti, Sakriani and Soria, Claudia}}, isbn = {{9791095546917}}, keywords = {{LT3}}, language = {{eng}}, location = {{Marseille, France}}, pages = {{176--184}}, publisher = {{European Language Resources Association (ELRA)}}, title = {{Investigating the quality of static anchor embeddings from transformers for under-resourced languages}}, url = {{http://www.lrec-conf.org/proceedings/lrec2022/workshops/SIGUL/index.html}}, year = {{2022}}, }