
Transfer learning from multi-lingual speech translation benefits low-resource speech recognition
- Author
- Geoffroy Vanderreydt (UGent) , François Remy (UGent) and Kris Demuynck (UGent)
- Organization
- Abstract
- In this article, we propose a simple yet effective approach to train an end-to-end speech recognition system on languages with limited resources by leveraging a large pre-trained wav2vec2.0 model fine-tuned on a multi-lingual speech translation task. We show that the weights of this model form an excellent initialization for Connectionist Temporal Classification (CTC) speech recognition, a different but closely related task. We explore the benefits of this initialization for various languages, both in-domain and out-of-domain for the speech translation task. Our experiments on the CommonVoice dataset confirm that our approach performs significantly better in-domain, and is often better out-of-domain too. This method is particularly relevant for Automatic Speech Recognition (ASR) with limited data and/or compute budget during training.
- Keywords
- XLS-R, CTC, ASR, Speech Recognition
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 234.36 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8769045
- MLA
- Vanderreydt, Geoffroy, et al. “Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition.” INTERSPEECH 2022, International Speech Communication Association (ISCA), 2022, pp. 3053–57, doi:10.21437/interspeech.2022-10744.
- APA
- Vanderreydt, G., Remy, F., & Demuynck, K. (2022). Transfer learning from multi-lingual speech translation benefits low-resource speech recognition. INTERSPEECH 2022, 3053–3057. https://doi.org/10.21437/interspeech.2022-10744
- Chicago author-date
- Vanderreydt, Geoffroy, François Remy, and Kris Demuynck. 2022. “Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition.” In INTERSPEECH 2022, 3053–57. International Speech Communication Association (ISCA). https://doi.org/10.21437/interspeech.2022-10744.
- Chicago author-date (all authors)
- Vanderreydt, Geoffroy, François Remy, and Kris Demuynck. 2022. “Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition.” In INTERSPEECH 2022, 3053–3057. International Speech Communication Association (ISCA). doi:10.21437/interspeech.2022-10744.
- Vancouver
- 1.Vanderreydt G, Remy F, Demuynck K. Transfer learning from multi-lingual speech translation benefits low-resource speech recognition. In: INTERSPEECH 2022. International Speech Communication Association (ISCA); 2022. p. 3053–7.
- IEEE
- [1]G. Vanderreydt, F. Remy, and K. Demuynck, “Transfer learning from multi-lingual speech translation benefits low-resource speech recognition,” in INTERSPEECH 2022, Incheon, Korea, 2022, pp. 3053–3057.
@inproceedings{8769045, abstract = {{In this article, we propose a simple yet effective approach to train an end-to-end speech recognition system on languages with limited resources by leveraging a large pre-trained wav2vec2.0 model fine-tuned on a multi-lingual speech translation task. We show that the weights of this model form an excellent initialization for Connectionist Temporal Classification (CTC) speech recognition, a different but closely related task. We explore the benefits of this initialization for various languages, both in-domain and out-of-domain for the speech translation task. Our experiments on the CommonVoice dataset confirm that our approach performs significantly better in-domain, and is often better out-of-domain too. This method is particularly relevant for Automatic Speech Recognition (ASR) with limited data and/or compute budget during training.}}, author = {{Vanderreydt, Geoffroy and Remy, François and Demuynck, Kris}}, booktitle = {{INTERSPEECH 2022}}, issn = {{2308-457X}}, keywords = {{XLS-R,CTC,ASR,Speech Recognition}}, language = {{eng}}, location = {{Incheon, Korea}}, pages = {{3053--3057}}, publisher = {{International Speech Communication Association (ISCA)}}, title = {{Transfer learning from multi-lingual speech translation benefits low-resource speech recognition}}, url = {{http://doi.org/10.21437/interspeech.2022-10744}}, year = {{2022}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: