Advanced search
1 file | 8.45 MB Add to list

On the role of audio frontends in bird species recognition

Houtan Ghaffari Jadidi (UGent) and Paul Devos (UGent)
Author
Organization
Project
Abstract
Automatic acoustic monitoring of bird populations and their diversity is in demand for conservation planning. This requirement and recent advances in deep learning have inspired sophisticated species recognizers. However, there are still open challenges in creating reliable monitoring systems of natural habitats. One of many open questions is whether predominantly used audio features like mel-filterbanks are appropriate for such analysis since their design follows human's perception of the sound, making them susceptible to discarding fine details from other animals' vocalization. Although research shows that different audio features work better for particular tasks and datasets, it is hard to attribute all advantages to input features since the experimental setups vary. A general solution is to design a learnable audio frontend to extract task-relevant features from raw waveform since it contains all the information in other audio features. The current paper thoroughly analyzes the role of such frontends in bird species recognition, which helped to evaluate the adequacy of traditional timefrequency representations (static frontends) in capturing the relevant information from bird vocalization. In particular, this work shows that the main performance gain in learnable audio frontends comes from the normalization and compression operations rather than the data-driven frequency selectivity and functional form of filters. We observed no significant discrepancy between the frequency bands of the learned and static frontends for bird vocalization. Although the performance of learnable frontends was much higher, we will show that adequate normalization and compression enhance the accuracy of traditional frontends by more than 16% to achieve comparable results for bird species recognition. Ablation studies of the frontends under different configurations and detailed analysis of noise robustness provide evidence for the conclusions, validate the use of mel-filterbanks and similar features in prior works, and provide guidelines for designing future species recognizers. The code is available at https://github.com/houtan-ghaffari/bird-frontends.
Keywords
Bioacoustics, Audio frontend, Bird sound recognition, Deep learning, REPRESENTATIONS, FEATURES, SOUNDS

Downloads

  • ACUS 709.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 8.45 MB

Citation

Please use this url to cite or link to this publication:

MLA
Ghaffari Jadidi, Houtan, and Paul Devos. “On the Role of Audio Frontends in Bird Species Recognition.” ECOLOGICAL INFORMATICS, vol. 81, 2024, doi:10.1016/j.ecoinf.2024.102573.
APA
Ghaffari Jadidi, H., & Devos, P. (2024). On the role of audio frontends in bird species recognition. ECOLOGICAL INFORMATICS, 81. https://doi.org/10.1016/j.ecoinf.2024.102573
Chicago author-date
Ghaffari Jadidi, Houtan, and Paul Devos. 2024. “On the Role of Audio Frontends in Bird Species Recognition.” ECOLOGICAL INFORMATICS 81. https://doi.org/10.1016/j.ecoinf.2024.102573.
Chicago author-date (all authors)
Ghaffari Jadidi, Houtan, and Paul Devos. 2024. “On the Role of Audio Frontends in Bird Species Recognition.” ECOLOGICAL INFORMATICS 81. doi:10.1016/j.ecoinf.2024.102573.
Vancouver
1.
Ghaffari Jadidi H, Devos P. On the role of audio frontends in bird species recognition. ECOLOGICAL INFORMATICS. 2024;81.
IEEE
[1]
H. Ghaffari Jadidi and P. Devos, “On the role of audio frontends in bird species recognition,” ECOLOGICAL INFORMATICS, vol. 81, 2024.
@article{01HYG8REQ5B156T9NJ0M9SVM16,
  abstract     = {{Automatic acoustic monitoring of bird populations and their diversity is in demand for conservation planning. This requirement and recent advances in deep learning have inspired sophisticated species recognizers. However, there are still open challenges in creating reliable monitoring systems of natural habitats. One of many open questions is whether predominantly used audio features like mel-filterbanks are appropriate for such analysis since their design follows human's perception of the sound, making them susceptible to discarding fine details from other animals' vocalization. Although research shows that different audio features work better for particular tasks and datasets, it is hard to attribute all advantages to input features since the experimental setups vary. A general solution is to design a learnable audio frontend to extract task-relevant features from raw waveform since it contains all the information in other audio features. The current paper thoroughly analyzes the role of such frontends in bird species recognition, which helped to evaluate the adequacy of traditional timefrequency representations (static frontends) in capturing the relevant information from bird vocalization. In particular, this work shows that the main performance gain in learnable audio frontends comes from the normalization and compression operations rather than the data-driven frequency selectivity and functional form of filters. We observed no significant discrepancy between the frequency bands of the learned and static frontends for bird vocalization. Although the performance of learnable frontends was much higher, we will show that adequate normalization and compression enhance the accuracy of traditional frontends by more than 16% to achieve comparable results for bird species recognition. Ablation studies of the frontends under different configurations and detailed analysis of noise robustness provide evidence for the conclusions, validate the use of mel-filterbanks and similar features in prior works, and provide guidelines for designing future species recognizers. The code is available at https://github.com/houtan-ghaffari/bird-frontends.}},
  articleno    = {{102573}},
  author       = {{Ghaffari Jadidi, Houtan and Devos, Paul}},
  issn         = {{1574-9541}},
  journal      = {{ECOLOGICAL INFORMATICS}},
  keywords     = {{Bioacoustics,Audio frontend,Bird sound recognition,Deep learning,REPRESENTATIONS,FEATURES,SOUNDS}},
  language     = {{eng}},
  pages        = {{12}},
  title        = {{On the role of audio frontends in bird species recognition}},
  url          = {{http://doi.org/10.1016/j.ecoinf.2024.102573}},
  volume       = {{81}},
  year         = {{2024}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: