Advanced search
2 files | 454.24 KB Add to list

Robust acoustic scene classification in the presence of active foreground speech

Author
Organization
Project
Abstract
We present an iVector based Acoustic Scene Classification (ASC) system suited for real life settings where active foreground speech can be present. In the proposed system, each recording is represented by a fixed-length iVector that models the recording's important properties. A regularized Gaussian backend classifier with class-specific covariance models is used to extract the relevant acoustic scene information from these iVectors. To alleviate the large performance degradation when a foreground speaker dominates the captured signal, we investigate the use of the iVector framework on Mel-Frequency Cepstral Coefficients (MFCCs) that are derived from an estimate of the noise power spectral density. This noise-floor can be extracted in a statistical manner for single channel recordings. We show that the use of noise-floor features is complementary to multi-condition training in which foreground speech is added to training signal to reduce the mismatch between training and testing conditions. Experimental results on the DCASE 2016 Task 1 dataset show that the noise-floor based features and multi-condition training realize significant classification accuracy gains of up to more than 25 percentage points (absolute) in the most adverse conditions. These promising results can further facilitate the integration of ASC in resource-constrained devices such as hearables.
Keywords
Acoustic scene classification, factor analysis, iVector, Gaussian backend, noise-floor estimation

Downloads

  • DS458 acc.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 227.63 KB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 226.61 KB

Citation

Please use this url to cite or link to this publication:

MLA
Song, Siyuan, et al. “Robust Acoustic Scene Classification in the Presence of Active Foreground Speech.” 2021 29th European Signal Processing Conference (EUSIPCO), IEEE, 2021, pp. 995–99, doi:10.23919/EUSIPCO54536.2021.9615984.
APA
Song, S., Desplanques, B., De Moor, C., Demuynck, K., & Madhu, N. (2021). Robust acoustic scene classification in the presence of active foreground speech. 2021 29th European Signal Processing Conference (EUSIPCO), 995–999. https://doi.org/10.23919/EUSIPCO54536.2021.9615984
Chicago author-date
Song, Siyuan, Brecht Desplanques, Celest De Moor, Kris Demuynck, and Nilesh Madhu. 2021. “Robust Acoustic Scene Classification in the Presence of Active Foreground Speech.” In 2021 29th European Signal Processing Conference (EUSIPCO), 995–99. IEEE. https://doi.org/10.23919/EUSIPCO54536.2021.9615984.
Chicago author-date (all authors)
Song, Siyuan, Brecht Desplanques, Celest De Moor, Kris Demuynck, and Nilesh Madhu. 2021. “Robust Acoustic Scene Classification in the Presence of Active Foreground Speech.” In 2021 29th European Signal Processing Conference (EUSIPCO), 995–999. IEEE. doi:10.23919/EUSIPCO54536.2021.9615984.
Vancouver
1.
Song S, Desplanques B, De Moor C, Demuynck K, Madhu N. Robust acoustic scene classification in the presence of active foreground speech. In: 2021 29th European Signal Processing Conference (EUSIPCO). IEEE; 2021. p. 995–9.
IEEE
[1]
S. Song, B. Desplanques, C. De Moor, K. Demuynck, and N. Madhu, “Robust acoustic scene classification in the presence of active foreground speech,” in 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland (online), 2021, pp. 995–999.
@inproceedings{8721207,
  abstract     = {{We present an iVector based Acoustic Scene Classification (ASC) system suited for real life settings where active foreground speech can be present. In the proposed system, each recording is represented by a fixed-length iVector that models the recording's important properties. A regularized Gaussian backend classifier with class-specific covariance models is used to extract the relevant acoustic scene information from these iVectors. To alleviate the large performance degradation when a foreground speaker dominates the captured signal, we investigate the use of the iVector framework on Mel-Frequency Cepstral Coefficients (MFCCs) that are derived from an estimate of the noise power spectral density. This noise-floor can be extracted in a statistical manner for single channel recordings. We show that the use of noise-floor features is complementary to multi-condition training in which foreground speech is added to training signal to reduce the mismatch between training and testing conditions. Experimental results on the DCASE 2016 Task 1 dataset show that the noise-floor based features and multi-condition training realize significant classification accuracy gains of up to more than 25 percentage points (absolute) in the most adverse conditions. These promising results can further facilitate the integration of ASC in resource-constrained devices such as hearables.}},
  author       = {{Song, Siyuan and Desplanques, Brecht and De Moor, Celest and Demuynck, Kris and Madhu, Nilesh}},
  booktitle    = {{2021 29th European Signal Processing Conference (EUSIPCO)}},
  isbn         = {{9789082797060}},
  issn         = {{2076-1465}},
  keywords     = {{Acoustic scene classification,factor analysis,iVector,Gaussian backend,noise-floor estimation}},
  language     = {{eng}},
  location     = {{Dublin, Ireland (online)}},
  pages        = {{995--999}},
  publisher    = {{IEEE}},
  title        = {{Robust acoustic scene classification in the presence of active foreground speech}},
  url          = {{http://doi.org/10.23919/EUSIPCO54536.2021.9615984}},
  year         = {{2021}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: