
DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models
- Author
- Alexander Bohlender (UGent) , Lucas Van Severen (UGent) , Jonathan Sterckx (UGent) and Nilesh Madhu (UGent)
- Organization
- Project
-
- Enabling Effortless Communication Under Adverse Conditions by Exploitation of Direction-of-Arrival Estimates and Other A-Priori Knowledge in Joint Acoustic Source Separation and Dereverberation
- Robust speech capture and enhancement using ad hoc distributed microphone arrays by integrating and embedding domain-specific signal models within deep-learning frameworks
- Abstract
- By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular central Gaussian mixture model (cACGMM), can be determined based on the given signal mixture itself. Then, no misfit between training and testing conditions arises, as opposed to approaches that require labeled datasets to be trained. Whereas the separation can be performed in a completely unsupervised way, it may be beneficial to take advantage of a priori knowledge. The parameter estimation is sensitive to the initialization, and it is necessary to address the frequency permutation problem. In this paper, we therefore consider three techniques to overcome these limitations using direction of arrival (DOA) estimates. First, we propose an initialization with simple DOA-based masks. Secondly, we derive speaker specific time annotations from the same masks in order to constrain the cACGMM. Thirdly, we employ an approach where the mixture components are specific to each DOA instead of each speaker. We conduct experiments with sudden DOA changes, as well as a gradually moving speaker. The results demonstrate that particularly the DOA-based initialization is effective to overcome both of the described limitations. In this case, even methods based on normally unavailable oracle information are not observed to be more beneficial to the permutation resolution or the initialization. Lastly, we also show that the proposed DOA-guided source separation works quite robustly in the presence of adverse conditions and realistic DOA estimation errors.
- Keywords
- BLIND SOURCE SEPARATION, NETWORKS, Guided source separation, Spatial clustering, Direction of arrival, Time-frequency masks
Downloads
-
DS531.pdf
- full text (Published version)
- |
- open access
- |
- |
- 1.95 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8758927
- MLA
- Bohlender, Alexander, et al. “DOA-Guided Source Separation with Direction-Based Initialization and Time Annotations Using Complex Angular Central Gaussian Mixture Models.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, vol. 2022, no. 1, 2022, doi:10.1186/s13636-022-00246-7.
- APA
- Bohlender, A., Van Severen, L., Sterckx, J., & Madhu, N. (2022). DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022(1). https://doi.org/10.1186/s13636-022-00246-7
- Chicago author-date
- Bohlender, Alexander, Lucas Van Severen, Jonathan Sterckx, and Nilesh Madhu. 2022. “DOA-Guided Source Separation with Direction-Based Initialization and Time Annotations Using Complex Angular Central Gaussian Mixture Models.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2022 (1). https://doi.org/10.1186/s13636-022-00246-7.
- Chicago author-date (all authors)
- Bohlender, Alexander, Lucas Van Severen, Jonathan Sterckx, and Nilesh Madhu. 2022. “DOA-Guided Source Separation with Direction-Based Initialization and Time Annotations Using Complex Angular Central Gaussian Mixture Models.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2022 (1). doi:10.1186/s13636-022-00246-7.
- Vancouver
- 1.Bohlender A, Van Severen L, Sterckx J, Madhu N. DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING. 2022;2022(1).
- IEEE
- [1]A. Bohlender, L. Van Severen, J. Sterckx, and N. Madhu, “DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models,” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, vol. 2022, no. 1, 2022.
@article{8758927, abstract = {{By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular central Gaussian mixture model (cACGMM), can be determined based on the given signal mixture itself. Then, no misfit between training and testing conditions arises, as opposed to approaches that require labeled datasets to be trained. Whereas the separation can be performed in a completely unsupervised way, it may be beneficial to take advantage of a priori knowledge. The parameter estimation is sensitive to the initialization, and it is necessary to address the frequency permutation problem. In this paper, we therefore consider three techniques to overcome these limitations using direction of arrival (DOA) estimates. First, we propose an initialization with simple DOA-based masks. Secondly, we derive speaker specific time annotations from the same masks in order to constrain the cACGMM. Thirdly, we employ an approach where the mixture components are specific to each DOA instead of each speaker. We conduct experiments with sudden DOA changes, as well as a gradually moving speaker. The results demonstrate that particularly the DOA-based initialization is effective to overcome both of the described limitations. In this case, even methods based on normally unavailable oracle information are not observed to be more beneficial to the permutation resolution or the initialization. Lastly, we also show that the proposed DOA-guided source separation works quite robustly in the presence of adverse conditions and realistic DOA estimation errors.}}, articleno = {{16}}, author = {{Bohlender, Alexander and Van Severen, Lucas and Sterckx, Jonathan and Madhu, Nilesh}}, issn = {{1687-4722}}, journal = {{EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING}}, keywords = {{BLIND SOURCE SEPARATION,NETWORKS,Guided source separation,Spatial clustering,Direction of arrival,Time-frequency masks}}, language = {{eng}}, number = {{1}}, pages = {{21}}, title = {{DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models}}, url = {{http://doi.org/10.1186/s13636-022-00246-7}}, volume = {{2022}}, year = {{2022}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: