Advanced search
2 files | 3.80 MB Add to list

Joint prediction of audio event and annoyance rating in an urban soundscape by hierarchical graph representation learning

Author
Organization
Project
Abstract
Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR.
Keywords
hierarchical graph representation learning, audio event classification, human annoyance rating prediction, NEURAL-NETWORKS, ATTENTION, MODEL

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 2.09 MB
  • ACUS 650a.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.71 MB

Citation

Please use this url to cite or link to this publication:

MLA
Hou, Yuanbo, et al. “Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning.” INTERSPEECH 2023, International Speech Communication Association (ISCA), 2023, pp. 331–35, doi:10.21437/interspeech.2023-1021.
APA
Hou, Y., Song, S., Luo, C., Mitchell, A., Ren, Q., Xie, W., … Int Speech Commun Assoc, [missing]. (2023). Joint prediction of audio event and annoyance rating in an urban soundscape by hierarchical graph representation learning. INTERSPEECH 2023, 331–335. https://doi.org/10.21437/interspeech.2023-1021
Chicago author-date
Hou, Yuanbo, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren, and [missing] Int Speech Commun Assoc. 2023. “Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning.” In INTERSPEECH 2023, 331–35. International Speech Communication Association (ISCA). https://doi.org/10.21437/interspeech.2023-1021.
Chicago author-date (all authors)
Hou, Yuanbo, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren, and [missing] Int Speech Commun Assoc. 2023. “Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning.” In INTERSPEECH 2023, 331–335. International Speech Communication Association (ISCA). doi:10.21437/interspeech.2023-1021.
Vancouver
1.
Hou Y, Song S, Luo C, Mitchell A, Ren Q, Xie W, et al. Joint prediction of audio event and annoyance rating in an urban soundscape by hierarchical graph representation learning. In: INTERSPEECH 2023. International Speech Communication Association (ISCA); 2023. p. 331–5.
IEEE
[1]
Y. Hou et al., “Joint prediction of audio event and annoyance rating in an urban soundscape by hierarchical graph representation learning,” in INTERSPEECH 2023, Dublin, Ireland, 2023, pp. 331–335.
@inproceedings{01HHHDV5KGCR9T3KR8XS1CS0Q6,
  abstract     = {{Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR.}},
  author       = {{Hou, Yuanbo and Song, Siyang and Luo, Cheng and Mitchell, Andrew and Ren, Qiaoqiao and Xie, Weicheng and Kang, Jian and Wang, Wenwu and Botteldooren, Dick and Int Speech Commun Assoc, [missing]}},
  booktitle    = {{INTERSPEECH 2023}},
  issn         = {{2308-457X}},
  keywords     = {{hierarchical graph representation learning,audio event classification,human annoyance rating prediction,NEURAL-NETWORKS,ATTENTION,MODEL}},
  language     = {{eng}},
  location     = {{Dublin, Ireland}},
  pages        = {{331--335}},
  publisher    = {{International Speech Communication Association (ISCA)}},
  title        = {{Joint prediction of audio event and annoyance rating in an urban soundscape by hierarchical graph representation learning}},
  url          = {{http://doi.org/10.21437/interspeech.2023-1021}},
  year         = {{2023}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: