
Factor analysis for speaker segmentation and improved speaker diarization
- Author
- Brecht Desplanques (UGent) , Kris Demuynck (UGent) and Jean-Pierre Martens (UGent)
- Organization
- Abstract
- Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative.
- Keywords
- clustering, factor analysis, speaker change detection, segmentation, speaker diarization
Downloads
-
2015 - Brecht Desplanques et al. - Factor analysis for speaker segmentation and improved speaker diarization.pdf
- full text
- |
- open access
- |
- |
- 558.34 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-7159960
- MLA
- Desplanques, Brecht, et al. “Factor Analysis for Speaker Segmentation and Improved Speaker Diarization.” 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Vols 1-5, International Speech Communication Association (ISCA), 2015, pp. 3081–85.
- APA
- Desplanques, B., Demuynck, K., & Martens, J.-P. (2015). Factor analysis for speaker segmentation and improved speaker diarization. 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Vols 1-5, 3081–3085. Baixas, France: International Speech Communication Association (ISCA).
- Chicago author-date
- Desplanques, Brecht, Kris Demuynck, and Jean-Pierre Martens. 2015. “Factor Analysis for Speaker Segmentation and Improved Speaker Diarization.” In 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Vols 1-5, 3081–85. Baixas, France: International Speech Communication Association (ISCA).
- Chicago author-date (all authors)
- Desplanques, Brecht, Kris Demuynck, and Jean-Pierre Martens. 2015. “Factor Analysis for Speaker Segmentation and Improved Speaker Diarization.” In 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Vols 1-5, 3081–3085. Baixas, France: International Speech Communication Association (ISCA).
- Vancouver
- 1.Desplanques B, Demuynck K, Martens J-P. Factor analysis for speaker segmentation and improved speaker diarization. In: 16th Annual conference of the International Speech Communication Association (INTERSPEECH 2015), vols 1-5. Baixas, France: International Speech Communication Association (ISCA); 2015. p. 3081–5.
- IEEE
- [1]B. Desplanques, K. Demuynck, and J.-P. Martens, “Factor analysis for speaker segmentation and improved speaker diarization,” in 16th Annual conference of the International Speech Communication Association (INTERSPEECH 2015), vols 1-5, Dresden, Germany, 2015, pp. 3081–3085.
@inproceedings{7159960, abstract = {{Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative.}}, author = {{Desplanques, Brecht and Demuynck, Kris and Martens, Jean-Pierre}}, booktitle = {{16th Annual conference of the International Speech Communication Association (INTERSPEECH 2015), vols 1-5}}, isbn = {{9781510817906}}, keywords = {{clustering,factor analysis,speaker change detection,segmentation,speaker diarization}}, language = {{eng}}, location = {{Dresden, Germany}}, pages = {{3081--3085}}, publisher = {{International Speech Communication Association (ISCA)}}, title = {{Factor analysis for speaker segmentation and improved speaker diarization}}, year = {{2015}}, }