Advanced search
1 file | 1.95 MB Add to list

DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models

Author
Organization
Project
Abstract
By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular central Gaussian mixture model (cACGMM), can be determined based on the given signal mixture itself. Then, no misfit between training and testing conditions arises, as opposed to approaches that require labeled datasets to be trained. Whereas the separation can be performed in a completely unsupervised way, it may be beneficial to take advantage of a priori knowledge. The parameter estimation is sensitive to the initialization, and it is necessary to address the frequency permutation problem. In this paper, we therefore consider three techniques to overcome these limitations using direction of arrival (DOA) estimates. First, we propose an initialization with simple DOA-based masks. Secondly, we derive speaker specific time annotations from the same masks in order to constrain the cACGMM. Thirdly, we employ an approach where the mixture components are specific to each DOA instead of each speaker. We conduct experiments with sudden DOA changes, as well as a gradually moving speaker. The results demonstrate that particularly the DOA-based initialization is effective to overcome both of the described limitations. In this case, even methods based on normally unavailable oracle information are not observed to be more beneficial to the permutation resolution or the initialization. Lastly, we also show that the proposed DOA-guided source separation works quite robustly in the presence of adverse conditions and realistic DOA estimation errors.
Keywords
BLIND SOURCE SEPARATION, NETWORKS, Guided source separation, Spatial clustering, Direction of arrival, Time-frequency masks

Downloads

  • DS531.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.95 MB

Citation

Please use this url to cite or link to this publication:

MLA
Bohlender, Alexander, et al. “DOA-Guided Source Separation with Direction-Based Initialization and Time Annotations Using Complex Angular Central Gaussian Mixture Models.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, vol. 2022, no. 1, 2022, doi:10.1186/s13636-022-00246-7.
APA
Bohlender, A., Van Severen, L., Sterckx, J., & Madhu, N. (2022). DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022(1). https://doi.org/10.1186/s13636-022-00246-7
Chicago author-date
Bohlender, Alexander, Lucas Van Severen, Jonathan Sterckx, and Nilesh Madhu. 2022. “DOA-Guided Source Separation with Direction-Based Initialization and Time Annotations Using Complex Angular Central Gaussian Mixture Models.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2022 (1). https://doi.org/10.1186/s13636-022-00246-7.
Chicago author-date (all authors)
Bohlender, Alexander, Lucas Van Severen, Jonathan Sterckx, and Nilesh Madhu. 2022. “DOA-Guided Source Separation with Direction-Based Initialization and Time Annotations Using Complex Angular Central Gaussian Mixture Models.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2022 (1). doi:10.1186/s13636-022-00246-7.
Vancouver
1.
Bohlender A, Van Severen L, Sterckx J, Madhu N. DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING. 2022;2022(1).
IEEE
[1]
A. Bohlender, L. Van Severen, J. Sterckx, and N. Madhu, “DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models,” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, vol. 2022, no. 1, 2022.
@article{8758927,
  abstract     = {{By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular central Gaussian mixture model (cACGMM), can be determined based on the given signal mixture itself. Then, no misfit between training and testing conditions arises, as opposed to approaches that require labeled datasets to be trained. Whereas the separation can be performed in a completely unsupervised way, it may be beneficial to take advantage of a priori knowledge. The parameter estimation is sensitive to the initialization, and it is necessary to address the frequency permutation problem. In this paper, we therefore consider three techniques to overcome these limitations using direction of arrival (DOA) estimates. First, we propose an initialization with simple DOA-based masks. Secondly, we derive speaker specific time annotations from the same masks in order to constrain the cACGMM. Thirdly, we employ an approach where the mixture components are specific to each DOA instead of each speaker. We conduct experiments with sudden DOA changes, as well as a gradually moving speaker. The results demonstrate that particularly the DOA-based initialization is effective to overcome both of the described limitations. In this case, even methods based on normally unavailable oracle information are not observed to be more beneficial to the permutation resolution or the initialization. Lastly, we also show that the proposed DOA-guided source separation works quite robustly in the presence of adverse conditions and realistic DOA estimation errors.}},
  articleno    = {{16}},
  author       = {{Bohlender, Alexander and Van Severen, Lucas and Sterckx, Jonathan and Madhu, Nilesh}},
  issn         = {{1687-4722}},
  journal      = {{EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING}},
  keywords     = {{BLIND SOURCE SEPARATION,NETWORKS,Guided source separation,Spatial clustering,Direction of arrival,Time-frequency masks}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{21}},
  title        = {{DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models}},
  url          = {{http://dx.doi.org/10.1186/s13636-022-00246-7}},
  volume       = {{2022}},
  year         = {{2022}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: