
Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization
- Author
- Jenthe Thienpondt (UGent) , Brecht Desplanques (UGent) and Kris Demuynck (UGent)
- Organization
- Abstract
- In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to finetune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45% respectively on the SdSVC evaluation set.
- Keywords
- speaker recognition, cross-lingual speaker verification, x-vectors, SdSV Challenge 2020
Downloads
-
DS369.pdf
- full text (Published version)
- |
- open access
- |
- |
- 263.59 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8680075
- MLA
- [1]J. Thienpondt, B. Desplanques, and K. Demuynck, “Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization,” in INTERSPEECH 2020, Online (Shanghai, China), 2020, pp. 756–760.
- APA
- Thienpondt, Jenthe, et al. “Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization.” INTERSPEECH 2020, International Speech Communication Association (ISCA), 2020, pp. 756–60, doi:10.21437/Interspeech.2020-2662.
- Chicago author-date
- Thienpondt, J., Desplanques, B., & Demuynck, K. (2020). Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization. INTERSPEECH 2020, 756–760. https://doi.org/10.21437/Interspeech.2020-2662
- Chicago author-date (all authors)
- Thienpondt, Jenthe, Brecht Desplanques, and Kris Demuynck. 2020. “Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization.” In INTERSPEECH 2020, 756–60. International Speech Communication Association (ISCA). https://doi.org/10.21437/Interspeech.2020-2662.
- Vancouver
- Thienpondt, Jenthe, Brecht Desplanques, and Kris Demuynck. 2020. “Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization.” In INTERSPEECH 2020, 756–760. International Speech Communication Association (ISCA). doi:10.21437/Interspeech.2020-2662.
- IEEE
- 1.Thienpondt J, Desplanques B, Demuynck K. Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization. In: INTERSPEECH 2020. International Speech Communication Association (ISCA); 2020. p. 756–60.
@inproceedings{8680075, abstract = {{In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to finetune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45% respectively on the SdSVC evaluation set.}}, author = {{Thienpondt, Jenthe and Desplanques, Brecht and Demuynck, Kris}}, booktitle = {{INTERSPEECH 2020}}, issn = {{2308-457X}}, keywords = {{speaker recognition,cross-lingual speaker verification,x-vectors,SdSV Challenge 2020}}, language = {{eng}}, location = {{Online (Shanghai, China)}}, pages = {{756--760}}, publisher = {{International Speech Communication Association (ISCA)}}, title = {{Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization}}, url = {{http://dx.doi.org/10.21437/Interspeech.2020-2662}}, year = {{2020}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: