Advanced search
2 files | 2.15 MB Add to list

The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification

Author
Organization
Abstract
In this paper we propose and analyse a large margin fine-tuning strategy and a quality-aware score calibration in text-independent speaker verification. Large margin fine-tuning is a secondary training stage for DNN based speaker verification systems trained with margin-based loss functions. It enables the network to create more robust speaker embeddings by enabling the use of longer training utterances in combination with a more aggressive margin penalty. Score calibration is a common practice in speaker verification systems to map output scores to well-calibrated log-likelihood-ratios, which can be converted to interpretable probabilities. By including quality features in the calibration system, the decision thresholds of the evaluation metrics become quality-dependent and more consistent across varying trial conditions. Applying both enhancements on the ECAPA-TDNN architecture leads to state-of-the-art results on all publicly available VoxCeleb1 test sets and contributed to our winning submissions in the supervised verification tracks of the Vox-Celeb Speaker Recognition Challenge 2020.
Keywords
speaker recognition, speaker verification, score calibration, RECOGNITION

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.92 MB
  • DS430 acc.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 230.52 KB

Citation

Please use this url to cite or link to this publication:

MLA
Thienpondt, Jenthe, et al. “The Idlab Voxsrc-20 Submission : Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification.” ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 5814–18, doi:10.1109/icassp39728.2021.9414600.
APA
Thienpondt, J., Desplanques, B., & Demuynck, K. (2021). The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5814–5818. https://doi.org/10.1109/icassp39728.2021.9414600
Chicago author-date
Thienpondt, Jenthe, Brecht Desplanques, and Kris Demuynck. 2021. “The Idlab Voxsrc-20 Submission : Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification.” In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5814–18. IEEE. https://doi.org/10.1109/icassp39728.2021.9414600.
Chicago author-date (all authors)
Thienpondt, Jenthe, Brecht Desplanques, and Kris Demuynck. 2021. “The Idlab Voxsrc-20 Submission : Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification.” In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5814–5818. IEEE. doi:10.1109/icassp39728.2021.9414600.
Vancouver
1.
Thienpondt J, Desplanques B, Demuynck K. The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2021. p. 5814–8.
IEEE
[1]
J. Thienpondt, B. Desplanques, and K. Demuynck, “The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021, pp. 5814–5818.
@inproceedings{8713072,
  abstract     = {{In this paper we propose and analyse a large margin fine-tuning strategy and a quality-aware score calibration in text-independent speaker verification. Large margin fine-tuning is a secondary training stage for DNN based speaker verification systems trained with margin-based loss functions. It enables the network to create more robust speaker embeddings by enabling the use of longer training utterances in combination with a more aggressive margin penalty. Score calibration is a common practice in speaker verification systems to map output scores to well-calibrated log-likelihood-ratios, which can be converted to interpretable probabilities. By including quality features in the calibration system, the decision thresholds of the evaluation metrics become quality-dependent and more consistent across varying trial conditions. Applying both enhancements on the ECAPA-TDNN architecture leads to state-of-the-art results on all publicly available VoxCeleb1 test sets and contributed to our winning submissions in the supervised verification tracks of the Vox-Celeb Speaker Recognition Challenge 2020.}},
  author       = {{Thienpondt, Jenthe and Desplanques, Brecht and Demuynck, Kris}},
  booktitle    = {{ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}},
  isbn         = {{9781728176055}},
  issn         = {{2379-190X}},
  keywords     = {{speaker recognition,speaker verification,score calibration,RECOGNITION}},
  language     = {{eng}},
  location     = {{Toronto, Canada}},
  pages        = {{5814--5818}},
  publisher    = {{IEEE}},
  title        = {{The Idlab voxsrc-20 submission : large margin fine-tuning and quality-aware score calibration in DNN based speaker verification}},
  url          = {{http://doi.org/10.1109/icassp39728.2021.9414600}},
  year         = {{2021}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: