The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification
- Author
- Jenthe Thienpondt (UGent) , Brecht Desplanques and Kris Demuynck (UGent)
- Organization
- Abstract
- In this paper we propose and analyse a large margin fine-tuning strategy and a quality-aware score calibration in text-independent speaker verification. Large margin fine-tuning is a secondary training stage for DNN based speaker verification systems trained with margin-based loss functions. It enables the network to create more robust speaker embeddings by enabling the use of longer training utterances in combination with a more aggressive margin penalty. Score calibration is a common practice in speaker verification systems to map output scores to well-calibrated log-likelihood-ratios, which can be converted to interpretable probabilities. By including quality features in the calibration system, the decision thresholds of the evaluation metrics become quality-dependent and more consistent across varying trial conditions. Applying both enhancements on the ECAPA-TDNN architecture leads to state-of-the-art results on all publicly available VoxCeleb1 test sets and contributed to our winning submissions in the supervised verification tracks of the Vox-Celeb Speaker Recognition Challenge 2020.
- Keywords
- speaker recognition, speaker verification, score calibration, RECOGNITION
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 1.92 MB
-
DS430 acc.pdf
- full text (Accepted manuscript)
- |
- open access
- |
- |
- 230.52 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8713072
- MLA
- Thienpondt, Jenthe, et al. “The Idlab Voxsrc-20 Submission : Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification.” ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 5814–18, doi:10.1109/icassp39728.2021.9414600.
- APA
- Thienpondt, J., Desplanques, B., & Demuynck, K. (2021). The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5814–5818. https://doi.org/10.1109/icassp39728.2021.9414600
- Chicago author-date
- Thienpondt, Jenthe, Brecht Desplanques, and Kris Demuynck. 2021. “The Idlab Voxsrc-20 Submission : Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification.” In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5814–18. IEEE. https://doi.org/10.1109/icassp39728.2021.9414600.
- Chicago author-date (all authors)
- Thienpondt, Jenthe, Brecht Desplanques, and Kris Demuynck. 2021. “The Idlab Voxsrc-20 Submission : Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification.” In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5814–5818. IEEE. doi:10.1109/icassp39728.2021.9414600.
- Vancouver
- 1.Thienpondt J, Desplanques B, Demuynck K. The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2021. p. 5814–8.
- IEEE
- [1]J. Thienpondt, B. Desplanques, and K. Demuynck, “The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021, pp. 5814–5818.
@inproceedings{8713072, abstract = {{In this paper we propose and analyse a large margin fine-tuning strategy and a quality-aware score calibration in text-independent speaker verification. Large margin fine-tuning is a secondary training stage for DNN based speaker verification systems trained with margin-based loss functions. It enables the network to create more robust speaker embeddings by enabling the use of longer training utterances in combination with a more aggressive margin penalty. Score calibration is a common practice in speaker verification systems to map output scores to well-calibrated log-likelihood-ratios, which can be converted to interpretable probabilities. By including quality features in the calibration system, the decision thresholds of the evaluation metrics become quality-dependent and more consistent across varying trial conditions. Applying both enhancements on the ECAPA-TDNN architecture leads to state-of-the-art results on all publicly available VoxCeleb1 test sets and contributed to our winning submissions in the supervised verification tracks of the Vox-Celeb Speaker Recognition Challenge 2020.}}, author = {{Thienpondt, Jenthe and Desplanques, Brecht and Demuynck, Kris}}, booktitle = {{ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}}, isbn = {{9781728176055}}, issn = {{2379-190X}}, keywords = {{speaker recognition,speaker verification,score calibration,RECOGNITION}}, language = {{eng}}, location = {{Toronto, Canada}}, pages = {{5814--5818}}, publisher = {{IEEE}}, title = {{The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification}}, url = {{http://doi.org/10.1109/icassp39728.2021.9414600}}, year = {{2021}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: