Advanced search
1 file | 2.03 MB Add to list

NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain

Author
Organization
Abstract
Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sparsity of speech in some representation for this purpose. Whereas the broadband approaches exploit time-domain sparsity for multi-speaker localization, narrowband approaches can additionally exploit sparsity and disjointness in the time-frequency representation. Broadband approaches are robust to spatial aliasing but do not optimally exploit the frequency domain sparsity, leading to poor localization performance for arrays with short inter-microphone distances. Narrowband approaches, on the other hand, are vulnerable to spatial aliasing, making them unsuitable for arrays with large inter-microphone spacing. Proposed here is an approach that decomposes a signal spectrum into a weighted sum of broadband spectral components (atoms) and then exploits signal sparsity in the time-atom representation for simultaneous multiple source localization. The decomposition into atoms is performed in situ using non-negative matrix factorization (NMF) of the short-term amplitude spectra and the localization estimate is obtained via a broadband steered-response power (SRP) approach for each active atom of a time frame. This SRP-NMF approach thereby combines the advantages of the narrowband and broadband approaches and performs well on the multi-speaker localization task for a broad range of inter-microphone spacings. On tests conducted on real-world data from public challenges such as SiSEC and LOCATA, and on data generated from recorded room impulse responses, the SRP-NMF approach outperforms the commonly used variants of narrowband and broadband localization approaches in terms of source detection capability and localization accuracy.
Keywords
Acoustics and Ultrasonics, Electrical and Electronic Engineering, Sound source localization, Direction-of-arrival, Non-negative matrix factorization, Spatial aliasing, Speech sparsity

Downloads

  • DS408.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 2.03 MB

Citation

Please use this url to cite or link to this publication:

MLA
Thakallapalli, Sushmita, et al. “NMF-Weighted SRP for Multi-Speaker Direction of Arrival Estimation : Robustness to Spatial Aliasing While Exploiting Sparsity in the Atom-Time Domain.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, vol. 2021, 2021, doi:10.1186/s13636-021-00201-y.
APA
Thakallapalli, S., Gangashetty, S. V., & Madhu, N. (2021). NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021. https://doi.org/10.1186/s13636-021-00201-y
Chicago author-date
Thakallapalli, Sushmita, Suryakanth V. Gangashetty, and Nilesh Madhu. 2021. “NMF-Weighted SRP for Multi-Speaker Direction of Arrival Estimation : Robustness to Spatial Aliasing While Exploiting Sparsity in the Atom-Time Domain.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2021. https://doi.org/10.1186/s13636-021-00201-y.
Chicago author-date (all authors)
Thakallapalli, Sushmita, Suryakanth V. Gangashetty, and Nilesh Madhu. 2021. “NMF-Weighted SRP for Multi-Speaker Direction of Arrival Estimation : Robustness to Spatial Aliasing While Exploiting Sparsity in the Atom-Time Domain.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2021. doi:10.1186/s13636-021-00201-y.
Vancouver
1.
Thakallapalli S, Gangashetty SV, Madhu N. NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING. 2021;2021.
IEEE
[1]
S. Thakallapalli, S. V. Gangashetty, and N. Madhu, “NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain,” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, vol. 2021, 2021.
@article{8700074,
  abstract     = {{Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sparsity of speech in some representation for this purpose. Whereas the broadband approaches exploit time-domain sparsity for multi-speaker localization, narrowband approaches can additionally exploit sparsity and disjointness in the time-frequency representation. Broadband approaches are robust to spatial aliasing but do not optimally exploit the frequency domain sparsity, leading to poor localization performance for arrays with short inter-microphone distances. Narrowband approaches, on the other hand, are vulnerable to spatial aliasing, making them unsuitable for arrays with large inter-microphone spacing. Proposed here is an approach that decomposes a signal spectrum into a weighted sum of broadband spectral components (atoms) and then exploits signal sparsity in the time-atom representation for simultaneous multiple source localization. The decomposition into atoms is performed in situ using non-negative matrix factorization (NMF) of the short-term amplitude spectra and the localization estimate is obtained via a broadband steered-response power (SRP) approach for each active atom of a time frame. This SRP-NMF approach thereby combines the advantages of the narrowband and broadband approaches and performs well on the multi-speaker localization task for a broad range of inter-microphone spacings. On tests conducted on real-world data from public challenges such as SiSEC and LOCATA, and on data generated from recorded room impulse responses, the SRP-NMF approach outperforms the commonly used variants of narrowband and broadband localization approaches in terms of source detection capability and localization accuracy.}},
  articleno    = {{13}},
  author       = {{Thakallapalli, Sushmita and Gangashetty, Suryakanth V. and Madhu, Nilesh}},
  issn         = {{1687-4722}},
  journal      = {{EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING}},
  keywords     = {{Acoustics and Ultrasonics,Electrical and Electronic Engineering,Sound source localization,Direction-of-arrival,Non-negative matrix factorization,Spatial aliasing,Speech sparsity}},
  language     = {{eng}},
  pages        = {{18}},
  title        = {{NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain}},
  url          = {{http://dx.doi.org/10.1186/s13636-021-00201-y}},
  volume       = {{2021}},
  year         = {{2021}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: