Advanced search
1 file | 1.58 MB Add to list

Audio event-relational graph representation learning for acoustic scene classification

Author
Organization
Project
Abstract
Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This letter conducts the first study on disclosing the relationship between real-life acoustic scenes and semantic embeddings from the most relevant AEs. Specifically, we propose an event-relational graph representation learning (ERGL) framework for ASC to classify scenes, and simultaneously answer clearly and straightly which cues are used in classifying. In the event-relational graph, embeddings of each event are treated as nodes, while relationship cues derived fromeach pair of nodes are described by multi-dimensional edge features. Experiments on a real-life ASC dataset show that the proposed ERGL achieves competitive performance on ASC by learning embeddings of only a limited number of AEs. The results show the feasibility of recognizing diverse acoustic scenes based on the audio event-relational graph.
Keywords
Applied Mathematics, Electrical and Electronic Engineering, Signal Processing, graph representation learning, multi-dimensional edge, event-relational graph, Acoustic scene classification

Downloads

  • ACUS 649.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.58 MB

Citation

Please use this url to cite or link to this publication:

MLA
Hou, Yuanbo, et al. “Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification.” IEEE SIGNAL PROCESSING LETTERS, vol. 30, 2023, pp. 1382–86, doi:10.1109/lsp.2023.3319233.
APA
Hou, Y., Song, S., Yu, C., Wang, W., & Botteldooren, D. (2023). Audio event-relational graph representation learning for acoustic scene classification. IEEE SIGNAL PROCESSING LETTERS, 30, 1382–1386. https://doi.org/10.1109/lsp.2023.3319233
Chicago author-date
Hou, Yuanbo, Siyang Song, Chuang Yu, Wenwu Wang, and Dick Botteldooren. 2023. “Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification.” IEEE SIGNAL PROCESSING LETTERS 30: 1382–86. https://doi.org/10.1109/lsp.2023.3319233.
Chicago author-date (all authors)
Hou, Yuanbo, Siyang Song, Chuang Yu, Wenwu Wang, and Dick Botteldooren. 2023. “Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification.” IEEE SIGNAL PROCESSING LETTERS 30: 1382–1386. doi:10.1109/lsp.2023.3319233.
Vancouver
1.
Hou Y, Song S, Yu C, Wang W, Botteldooren D. Audio event-relational graph representation learning for acoustic scene classification. IEEE SIGNAL PROCESSING LETTERS. 2023;30:1382–6.
IEEE
[1]
Y. Hou, S. Song, C. Yu, W. Wang, and D. Botteldooren, “Audio event-relational graph representation learning for acoustic scene classification,” IEEE SIGNAL PROCESSING LETTERS, vol. 30, pp. 1382–1386, 2023.
@article{01HHHD156KGAY2M9GW1K8CTP32,
  abstract     = {{Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This letter conducts the first study on disclosing the relationship between real-life acoustic scenes and semantic embeddings from the most relevant AEs. Specifically, we propose an event-relational graph representation learning (ERGL) framework for ASC to classify scenes, and simultaneously answer clearly and straightly which cues are used in classifying. In the event-relational graph, embeddings of each event are treated as nodes, while relationship cues derived fromeach pair of nodes are described by multi-dimensional edge features. Experiments on a real-life ASC dataset show that the proposed ERGL achieves competitive performance on ASC by learning embeddings of only a limited number of AEs. The results show the feasibility of recognizing diverse acoustic scenes based on the audio event-relational graph.}},
  author       = {{Hou, Yuanbo and Song, Siyang and Yu, Chuang and Wang, Wenwu and Botteldooren, Dick}},
  issn         = {{1070-9908}},
  journal      = {{IEEE SIGNAL PROCESSING LETTERS}},
  keywords     = {{Applied Mathematics,Electrical and Electronic Engineering,Signal Processing,graph representation learning,multi-dimensional edge,event-relational graph,Acoustic scene classification}},
  language     = {{eng}},
  pages        = {{1382--1386}},
  title        = {{Audio event-relational graph representation learning for acoustic scene classification}},
  url          = {{http://doi.org/10.1109/lsp.2023.3319233}},
  volume       = {{30}},
  year         = {{2023}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: