Advanced search
2 files | 1.93 MB Add to list

Insights into magnitude and phase estimation by masking and mapping in DNN-based multichannel speaker separation

Author
Organization
Project
Abstract
Speakers are often separated by time-frequency masking in the short-time Fourier domain to take advantage of the high degree of sparsity of the individual speech spectrograms. Magnitude and phase can be jointly enhanced with complex masks, but prior work suggests that directly mapping the input to the complex spectrogram of the clean signal is a better alternative. For a setup with a compact microphone array, experiments conducted in this paper compare these paradigms with focus on magnitude and phase estimation. Whereas phase is enhanced effectively in general, differences between masking and mapping are minor in this regard. Spectral mapping causes the least target distortion. Complex masking better suppresses interference, but speech quality suffers due to artifacts. Combining magnitude masking with phase mapping presents a compromise, which amounts to the best performance regarding instrumental metrics.

Downloads

  • DS806 acc.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 621.70 KB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.31 MB

Citation

Please use this url to cite or link to this publication:

MLA
Bohlender, Alexander, et al. “Insights into Magnitude and Phase Estimation by Masking and Mapping in DNN-Based Multichannel Speaker Separation.” 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), IEEE, 2024, pp. 500–04, doi:10.1109/icasspw62465.2024.10627167.
APA
Bohlender, A., Spriet, A., Tirry, W., & Madhu, N. (2024). Insights into magnitude and phase estimation by masking and mapping in DNN-based multichannel speaker separation. 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 500–504. https://doi.org/10.1109/icasspw62465.2024.10627167
Chicago author-date
Bohlender, Alexander, Anneleen Spriet, Wouter Tirry, and Nilesh Madhu. 2024. “Insights into Magnitude and Phase Estimation by Masking and Mapping in DNN-Based Multichannel Speaker Separation.” In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 500–504. IEEE. https://doi.org/10.1109/icasspw62465.2024.10627167.
Chicago author-date (all authors)
Bohlender, Alexander, Anneleen Spriet, Wouter Tirry, and Nilesh Madhu. 2024. “Insights into Magnitude and Phase Estimation by Masking and Mapping in DNN-Based Multichannel Speaker Separation.” In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 500–504. IEEE. doi:10.1109/icasspw62465.2024.10627167.
Vancouver
1.
Bohlender A, Spriet A, Tirry W, Madhu N. Insights into magnitude and phase estimation by masking and mapping in DNN-based multichannel speaker separation. In: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE; 2024. p. 500–4.
IEEE
[1]
A. Bohlender, A. Spriet, W. Tirry, and N. Madhu, “Insights into magnitude and phase estimation by masking and mapping in DNN-based multichannel speaker separation,” in 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Seoul, Republic of Korea, 2024, pp. 500–504.
@inproceedings{01J6W04Z9C49BBWVCAB0FPG356,
  abstract     = {{Speakers are often separated by time-frequency masking in the short-time Fourier domain to take advantage of the high degree of sparsity of the individual speech spectrograms. Magnitude and phase can be jointly enhanced with complex masks, but prior work suggests that directly mapping the input to the complex spectrogram of the clean signal is a better alternative. For a setup with a compact microphone array, experiments conducted in this paper compare these paradigms with focus on magnitude and phase estimation. Whereas phase is enhanced effectively in general, differences between masking and mapping are minor in this regard. Spectral mapping causes the least target distortion. Complex masking better suppresses interference, but speech quality suffers due to artifacts. Combining magnitude masking with phase mapping presents a compromise, which amounts to the best performance regarding instrumental metrics.}},
  author       = {{Bohlender, Alexander and Spriet, Anneleen and Tirry, Wouter and Madhu, Nilesh}},
  booktitle    = {{2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)}},
  isbn         = {{9798350374513}},
  language     = {{eng}},
  location     = {{Seoul, Republic of Korea}},
  pages        = {{500--504}},
  publisher    = {{IEEE}},
  title        = {{Insights into magnitude and phase estimation by masking and mapping in DNN-based multichannel speaker separation}},
  url          = {{http://doi.org/10.1109/icasspw62465.2024.10627167}},
  year         = {{2024}},
}

Altmetric
View in Altmetric