
NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain
- Author
- Sushmita Thakallapalli, Suryakanth V. Gangashetty and Nilesh Madhu (UGent)
- Organization
- Abstract
- Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sparsity of speech in some representation for this purpose. Whereas the broadband approaches exploit time-domain sparsity for multi-speaker localization, narrowband approaches can additionally exploit sparsity and disjointness in the time-frequency representation. Broadband approaches are robust to spatial aliasing but do not optimally exploit the frequency domain sparsity, leading to poor localization performance for arrays with short inter-microphone distances. Narrowband approaches, on the other hand, are vulnerable to spatial aliasing, making them unsuitable for arrays with large inter-microphone spacing. Proposed here is an approach that decomposes a signal spectrum into a weighted sum of broadband spectral components (atoms) and then exploits signal sparsity in the time-atom representation for simultaneous multiple source localization. The decomposition into atoms is performed in situ using non-negative matrix factorization (NMF) of the short-term amplitude spectra and the localization estimate is obtained via a broadband steered-response power (SRP) approach for each active atom of a time frame. This SRP-NMF approach thereby combines the advantages of the narrowband and broadband approaches and performs well on the multi-speaker localization task for a broad range of inter-microphone spacings. On tests conducted on real-world data from public challenges such as SiSEC and LOCATA, and on data generated from recorded room impulse responses, the SRP-NMF approach outperforms the commonly used variants of narrowband and broadband localization approaches in terms of source detection capability and localization accuracy.
- Keywords
- Acoustics and Ultrasonics, Electrical and Electronic Engineering, Sound source localization, Direction-of-arrival, Non-negative matrix factorization, Spatial aliasing, Speech sparsity
Downloads
-
DS408.pdf
- full text (Published version)
- |
- open access
- |
- |
- 2.03 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8700074
- MLA
- Thakallapalli, Sushmita, et al. “NMF-Weighted SRP for Multi-Speaker Direction of Arrival Estimation : Robustness to Spatial Aliasing While Exploiting Sparsity in the Atom-Time Domain.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, vol. 2021, 2021, doi:10.1186/s13636-021-00201-y.
- APA
- Thakallapalli, S., Gangashetty, S. V., & Madhu, N. (2021). NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021. https://doi.org/10.1186/s13636-021-00201-y
- Chicago author-date
- Thakallapalli, Sushmita, Suryakanth V. Gangashetty, and Nilesh Madhu. 2021. “NMF-Weighted SRP for Multi-Speaker Direction of Arrival Estimation : Robustness to Spatial Aliasing While Exploiting Sparsity in the Atom-Time Domain.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2021. https://doi.org/10.1186/s13636-021-00201-y.
- Chicago author-date (all authors)
- Thakallapalli, Sushmita, Suryakanth V. Gangashetty, and Nilesh Madhu. 2021. “NMF-Weighted SRP for Multi-Speaker Direction of Arrival Estimation : Robustness to Spatial Aliasing While Exploiting Sparsity in the Atom-Time Domain.” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2021. doi:10.1186/s13636-021-00201-y.
- Vancouver
- 1.Thakallapalli S, Gangashetty SV, Madhu N. NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING. 2021;2021.
- IEEE
- [1]S. Thakallapalli, S. V. Gangashetty, and N. Madhu, “NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain,” EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, vol. 2021, 2021.
@article{8700074, abstract = {{Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sparsity of speech in some representation for this purpose. Whereas the broadband approaches exploit time-domain sparsity for multi-speaker localization, narrowband approaches can additionally exploit sparsity and disjointness in the time-frequency representation. Broadband approaches are robust to spatial aliasing but do not optimally exploit the frequency domain sparsity, leading to poor localization performance for arrays with short inter-microphone distances. Narrowband approaches, on the other hand, are vulnerable to spatial aliasing, making them unsuitable for arrays with large inter-microphone spacing. Proposed here is an approach that decomposes a signal spectrum into a weighted sum of broadband spectral components (atoms) and then exploits signal sparsity in the time-atom representation for simultaneous multiple source localization. The decomposition into atoms is performed in situ using non-negative matrix factorization (NMF) of the short-term amplitude spectra and the localization estimate is obtained via a broadband steered-response power (SRP) approach for each active atom of a time frame. This SRP-NMF approach thereby combines the advantages of the narrowband and broadband approaches and performs well on the multi-speaker localization task for a broad range of inter-microphone spacings. On tests conducted on real-world data from public challenges such as SiSEC and LOCATA, and on data generated from recorded room impulse responses, the SRP-NMF approach outperforms the commonly used variants of narrowband and broadband localization approaches in terms of source detection capability and localization accuracy.}}, articleno = {{13}}, author = {{Thakallapalli, Sushmita and Gangashetty, Suryakanth V. and Madhu, Nilesh}}, issn = {{1687-4722}}, journal = {{EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING}}, keywords = {{Acoustics and Ultrasonics,Electrical and Electronic Engineering,Sound source localization,Direction-of-arrival,Non-negative matrix factorization,Spatial aliasing,Speech sparsity}}, language = {{eng}}, pages = {{18}}, title = {{NMF-weighted SRP for multi-speaker direction of arrival estimation : robustness to spatial aliasing while exploiting sparsity in the atom-time domain}}, url = {{http://dx.doi.org/10.1186/s13636-021-00201-y}}, volume = {{2021}}, year = {{2021}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: