Advanced search
1 file | 234.36 KB Add to list

Transfer learning from multi-lingual speech translation benefits low-resource speech recognition

Geoffroy Vanderreydt (UGent) , François Remy (UGent) and Kris Demuynck (UGent)
(2022) INTERSPEECH 2022. In Interspeech p.3053-3057
Author
Organization
Abstract
In this article, we propose a simple yet effective approach to train an end-to-end speech recognition system on languages with limited resources by leveraging a large pre-trained wav2vec2.0 model fine-tuned on a multi-lingual speech translation task. We show that the weights of this model form an excellent initialization for Connectionist Temporal Classification (CTC) speech recognition, a different but closely related task. We explore the benefits of this initialization for various languages, both in-domain and out-of-domain for the speech translation task. Our experiments on the CommonVoice dataset confirm that our approach performs significantly better in-domain, and is often better out-of-domain too. This method is particularly relevant for Automatic Speech Recognition (ASR) with limited data and/or compute budget during training.
Keywords
XLS-R, CTC, ASR, Speech Recognition

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 234.36 KB

Citation

Please use this url to cite or link to this publication:

MLA
Vanderreydt, Geoffroy, et al. “Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition.” INTERSPEECH 2022, International Speech Communication Association (ISCA), 2022, pp. 3053–57, doi:10.21437/interspeech.2022-10744.
APA
Vanderreydt, G., Remy, F., & Demuynck, K. (2022). Transfer learning from multi-lingual speech translation benefits low-resource speech recognition. INTERSPEECH 2022, 3053–3057. https://doi.org/10.21437/interspeech.2022-10744
Chicago author-date
Vanderreydt, Geoffroy, François Remy, and Kris Demuynck. 2022. “Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition.” In INTERSPEECH 2022, 3053–57. International Speech Communication Association (ISCA). https://doi.org/10.21437/interspeech.2022-10744.
Chicago author-date (all authors)
Vanderreydt, Geoffroy, François Remy, and Kris Demuynck. 2022. “Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition.” In INTERSPEECH 2022, 3053–3057. International Speech Communication Association (ISCA). doi:10.21437/interspeech.2022-10744.
Vancouver
1.
Vanderreydt G, Remy F, Demuynck K. Transfer learning from multi-lingual speech translation benefits low-resource speech recognition. In: INTERSPEECH 2022. International Speech Communication Association (ISCA); 2022. p. 3053–7.
IEEE
[1]
G. Vanderreydt, F. Remy, and K. Demuynck, “Transfer learning from multi-lingual speech translation benefits low-resource speech recognition,” in INTERSPEECH 2022, Incheon, Korea, 2022, pp. 3053–3057.
@inproceedings{8769045,
  abstract     = {{In this article, we propose a simple yet effective approach to train an end-to-end speech recognition system on languages with limited resources by leveraging a large pre-trained wav2vec2.0 model fine-tuned on a multi-lingual speech translation task. We show that the weights of this model form an excellent initialization for Connectionist Temporal Classification (CTC) speech recognition, a different but closely related task. We explore the benefits of this initialization for various languages, both in-domain and out-of-domain for the speech translation task. Our experiments on the CommonVoice dataset confirm that our approach performs significantly better in-domain, and is often better out-of-domain too. This method is particularly relevant for Automatic Speech Recognition (ASR) with limited data and/or compute budget during training.}},
  author       = {{Vanderreydt, Geoffroy and Remy, François and Demuynck, Kris}},
  booktitle    = {{INTERSPEECH 2022}},
  issn         = {{2308-457X}},
  keywords     = {{XLS-R,CTC,ASR,Speech Recognition}},
  language     = {{eng}},
  location     = {{Incheon, Korea}},
  pages        = {{3053--3057}},
  publisher    = {{International Speech Communication Association (ISCA)}},
  title        = {{Transfer learning from multi-lingual speech translation benefits low-resource speech recognition}},
  url          = {{http://doi.org/10.21437/interspeech.2022-10744}},
  year         = {{2022}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: