Advanced search
1 file | 558.34 KB Add to list

Factor analysis for speaker segmentation and improved speaker diarization

Brecht Desplanques (UGent) , Kris Demuynck (UGent) and Jean-Pierre Martens (UGent)
Author
Organization
Abstract
Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative.
Keywords
clustering, factor analysis, speaker change detection, segmentation, speaker diarization

Downloads

  • 2015 - Brecht Desplanques et al. - Factor analysis for speaker segmentation and improved speaker diarization.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 558.34 KB

Citation

Please use this url to cite or link to this publication:

MLA
Desplanques, Brecht, et al. “Factor Analysis for Speaker Segmentation and Improved Speaker Diarization.” 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Vols 1-5, International Speech Communication Association (ISCA), 2015, pp. 3081–85.
APA
Desplanques, B., Demuynck, K., & Martens, J.-P. (2015). Factor analysis for speaker segmentation and improved speaker diarization. 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Vols 1-5, 3081–3085. Baixas, France: International Speech Communication Association (ISCA).
Chicago author-date
Desplanques, Brecht, Kris Demuynck, and Jean-Pierre Martens. 2015. “Factor Analysis for Speaker Segmentation and Improved Speaker Diarization.” In 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Vols 1-5, 3081–85. Baixas, France: International Speech Communication Association (ISCA).
Chicago author-date (all authors)
Desplanques, Brecht, Kris Demuynck, and Jean-Pierre Martens. 2015. “Factor Analysis for Speaker Segmentation and Improved Speaker Diarization.” In 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Vols 1-5, 3081–3085. Baixas, France: International Speech Communication Association (ISCA).
Vancouver
1.
Desplanques B, Demuynck K, Martens J-P. Factor analysis for speaker segmentation and improved speaker diarization. In: 16th Annual conference of the International Speech Communication Association (INTERSPEECH 2015), vols 1-5. Baixas, France: International Speech Communication Association (ISCA); 2015. p. 3081–5.
IEEE
[1]
B. Desplanques, K. Demuynck, and J.-P. Martens, “Factor analysis for speaker segmentation and improved speaker diarization,” in 16th Annual conference of the International Speech Communication Association (INTERSPEECH 2015), vols 1-5, Dresden, Germany, 2015, pp. 3081–3085.
@inproceedings{7159960,
  abstract     = {{Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative.}},
  author       = {{Desplanques, Brecht and Demuynck, Kris and Martens, Jean-Pierre}},
  booktitle    = {{16th Annual conference of the International Speech Communication Association (INTERSPEECH 2015), vols 1-5}},
  isbn         = {{9781510817906}},
  keywords     = {{clustering,factor analysis,speaker change detection,segmentation,speaker diarization}},
  language     = {{eng}},
  location     = {{Dresden, Germany}},
  pages        = {{3081--3085}},
  publisher    = {{International Speech Communication Association (ISCA)}},
  title        = {{Factor analysis for speaker segmentation and improved speaker diarization}},
  year         = {{2015}},
}

Web of Science
Times cited: