Advanced search
1 file | 327.60 KB Add to list

End-to-end learning for music audio

Sander Dieleman (UGent) and Benjamin Schrauwen (UGent)
Author
Organization
Abstract
Content-based music information retrieval tasks have traditionally been solved using engineered features and shallow processing architectures. In recent years, there has been increasing interest in using feature learning and deep architectures instead, thus reducing the required engineering effort and the need for prior knowledge. However, this new approach typically still relies on mid-level representations of music audio, e.g. spectrograms, instead of raw audio signals. In this paper, we investigate whether it is possible to apply feature learning directly to raw audio signals. We train convolutional neural networks using both approaches and compare their performance on an automatic tagging task. Although they do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.
Keywords
feature learning, convolutional neural networks, end-to-end learning, music information retrieval, automatic tagging, RECOGNITION, NETWORKS

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 327.60 KB

Citation

Please use this url to cite or link to this publication:

MLA
Dieleman, Sander, and Benjamin Schrauwen. “End-to-end Learning for Music Audio.” International Conference on Acoustics Speech and Signal Processing ICASSP. IEEE, 2014. 6964–6968. Print.
APA
Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. International Conference on Acoustics Speech and Signal Processing ICASSP (pp. 6964–6968). Presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE.
Chicago author-date
Dieleman, Sander, and Benjamin Schrauwen. 2014. “End-to-end Learning for Music Audio.” In International Conference on Acoustics Speech and Signal Processing ICASSP, 6964–6968. IEEE.
Chicago author-date (all authors)
Dieleman, Sander, and Benjamin Schrauwen. 2014. “End-to-end Learning for Music Audio.” In International Conference on Acoustics Speech and Signal Processing ICASSP, 6964–6968. IEEE.
Vancouver
1.
Dieleman S, Schrauwen B. End-to-end learning for music audio. International Conference on Acoustics Speech and Signal Processing ICASSP. IEEE; 2014. p. 6964–8.
IEEE
[1]
S. Dieleman and B. Schrauwen, “End-to-end learning for music audio,” in International Conference on Acoustics Speech and Signal Processing ICASSP, Florence, Italy, 2014, pp. 6964–6968.
@inproceedings{5952091,
  abstract     = {Content-based music information retrieval tasks have traditionally been solved using engineered features and shallow processing architectures. In recent years, there has been increasing interest in using feature learning and deep architectures instead, thus reducing the required engineering effort and the need for prior knowledge. However, this new approach typically still relies on mid-level representations of music audio, e.g. spectrograms, instead of raw audio signals. In this paper, we investigate whether it is possible to apply feature learning directly to raw audio signals. We train convolutional neural networks using both approaches and compare their performance on an automatic tagging task. Although they do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.},
  author       = {Dieleman, Sander and Schrauwen, Benjamin},
  booktitle    = {International Conference on Acoustics Speech and Signal Processing ICASSP},
  isbn         = {9781479928934},
  issn         = {1520-6149},
  keywords     = {feature learning,convolutional neural networks,end-to-end learning,music information retrieval,automatic tagging,RECOGNITION,NETWORKS},
  language     = {eng},
  location     = {Florence, Italy},
  pages        = {6964--6968},
  publisher    = {IEEE},
  title        = {End-to-end learning for music audio},
  url          = {http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6854950},
  year         = {2014},
}

Web of Science
Times cited: