Advanced search
1 file | 263.59 KB Add to list

Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization

Jenthe Thienpondt (UGent) , Brecht Desplanques (UGent) and Kris Demuynck (UGent)
Author
Organization
Abstract
In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to finetune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45% respectively on the SdSVC evaluation set.

Downloads

  • DS369.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 263.59 KB

Citation

Please use this url to cite or link to this publication:

MLA
Thienpondt, Jenthe, et al. “Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization.” Proc. Interspeech 2020, International Speech Communication Association (ISCA), 2020, pp. 756–60, doi:10.21437/Interspeech.2020-2662.
APA
Thienpondt, J., Desplanques, B., & Demuynck, K. (2020). Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization. In Proc. Interspeech 2020 (pp. 756–760). Online: International Speech Communication Association (ISCA). https://doi.org/10.21437/Interspeech.2020-2662
Chicago author-date
Thienpondt, Jenthe, Brecht Desplanques, and Kris Demuynck. 2020. “Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization.” In Proc. Interspeech 2020, 756–60. International Speech Communication Association (ISCA). https://doi.org/10.21437/Interspeech.2020-2662.
Chicago author-date (all authors)
Thienpondt, Jenthe, Brecht Desplanques, and Kris Demuynck. 2020. “Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization.” In Proc. Interspeech 2020, 756–760. International Speech Communication Association (ISCA). doi:10.21437/Interspeech.2020-2662.
Vancouver
1.
Thienpondt J, Desplanques B, Demuynck K. Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization. In: Proc Interspeech 2020. International Speech Communication Association (ISCA); 2020. p. 756–60.
IEEE
[1]
J. Thienpondt, B. Desplanques, and K. Demuynck, “Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization,” in Proc. Interspeech 2020, Online, 2020, pp. 756–760.
@inproceedings{8680075,
  abstract     = {{In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to finetune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45% respectively on the SdSVC evaluation set.}},
  author       = {{Thienpondt, Jenthe and Desplanques, Brecht and Demuynck, Kris}},
  booktitle    = {{Proc. Interspeech 2020}},
  issn         = {{1990-9772}},
  language     = {{eng}},
  location     = {{Online}},
  pages        = {{756--760}},
  publisher    = {{International Speech Communication Association (ISCA)}},
  title        = {{Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization}},
  url          = {{http://dx.doi.org/10.21437/Interspeech.2020-2662}},
  year         = {{2020}},
}

Altmetric
View in Altmetric