Ghent University Academic Bibliography

Advanced

Audio-based music classification with a pretrained convolutional network

Sander Dieleman UGent, Philémon Brakel UGent and Benjamin Schrauwen UGent (2011) Proceedings of the 12th international society for music information retrieval conference : Proc. ISMIR 2011. p.669-674
abstract
Recently the ‘Million Song Dataset’, containing audio features and metadata for one million songs, was made available. In this paper, we build a convolutional network that is then trained to perform artist recognition, genre recognition and key detection. The network is tailored to summarize the audio features over musically significant timescales. It is infeasible to train the network on all available data in a supervised fashion, so we use unsupervised pretraining to be able to harness the entire dataset: we train a convolutional deep belief network on all data, and then use the learnt parameters to initialize a convolutional multilayer perceptron with the same architecture. The MLP is then trained on a labeled subset of the data for each task. We also train the same MLP with randomly initialized weights. We find that our convolutional approach improves accuracy for the genre recognition and artist recognition tasks. Unsupervised pretraining improves convergence speed in all cases. For artist recognition it improves accuracy as well.
Please use this url to cite or link to this publication:
author
organization
year
type
conference
publication status
published
subject
in
Proceedings of the 12th international society for music information retrieval conference : Proc. ISMIR 2011
editor
Anssi Klapuri and Colby Leider
pages
669 - 674
publisher
University of Miami
place of publication
Miami, FL, USA
conference name
12th International Society for Music Information Retrieval Conference (ISMIR - 2011)
conference location
Miami, FL, USA
conference start
2011-10-24
conference end
2011-10-28
ISBN
9780615548654
language
English
UGent publication?
yes
classification
C1
copyright statement
I have retained and own the full copyright for this publication
id
1989534
handle
http://hdl.handle.net/1854/LU-1989534
date created
2012-01-17 12:44:16
date last changed
2015-06-17 09:54:23
@inproceedings{1989534,
  abstract     = {Recently the {\textquoteleft}Million Song Dataset{\textquoteright}, containing audio features and metadata for one million songs, was made available. In this paper, we build a convolutional network that is then trained to perform artist recognition, genre recognition and key detection. The network is tailored to summarize the audio features over musically significant timescales. It is infeasible to train the network on all available data in a supervised fashion, so we use unsupervised pretraining to be able to harness the entire dataset: we train a convolutional deep belief network on all data, and then use the learnt parameters to initialize a convolutional multilayer perceptron with the same architecture. The MLP is then trained on a labeled subset of the data for each task. We also train the same MLP with randomly initialized weights. We find that our convolutional approach improves accuracy for the genre recognition and artist recognition tasks. Unsupervised pretraining improves convergence speed in all cases. For artist recognition it improves accuracy as well.},
  author       = {Dieleman, Sander and Brakel, Phil{\'e}mon and Schrauwen, Benjamin},
  booktitle    = {Proceedings of the 12th international society for music information retrieval conference : Proc. ISMIR 2011},
  editor       = {Klapuri, Anssi and Leider, Colby},
  isbn         = {9780615548654},
  language     = {eng},
  location     = {Miami, FL, USA},
  pages        = {669--674},
  publisher    = {University of Miami},
  title        = {Audio-based music classification with a pretrained convolutional network},
  year         = {2011},
}

Chicago
Dieleman, Sander, Philémon Brakel, and Benjamin Schrauwen. 2011. “Audio-based Music Classification with a Pretrained Convolutional Network.” In Proceedings of the 12th International Society for Music Information Retrieval Conference : Proc. ISMIR 2011, ed. Anssi Klapuri and Colby Leider, 669–674. Miami, FL, USA: University of Miami.
APA
Dieleman, S., Brakel, P., & Schrauwen, B. (2011). Audio-based music classification with a pretrained convolutional network. In A. Klapuri & C. Leider (Eds.), Proceedings of the 12th international society for music information retrieval conference : Proc. ISMIR 2011 (pp. 669–674). Presented at the 12th International Society for Music Information Retrieval Conference (ISMIR - 2011), Miami, FL, USA: University of Miami.
Vancouver
1.
Dieleman S, Brakel P, Schrauwen B. Audio-based music classification with a pretrained convolutional network. In: Klapuri A, Leider C, editors. Proceedings of the 12th international society for music information retrieval conference : Proc. ISMIR 2011. Miami, FL, USA: University of Miami; 2011. p. 669–74.
MLA
Dieleman, Sander, Philémon Brakel, and Benjamin Schrauwen. “Audio-based Music Classification with a Pretrained Convolutional Network.” Proceedings of the 12th International Society for Music Information Retrieval Conference : Proc. ISMIR 2011. Ed. Anssi Klapuri & Colby Leider. Miami, FL, USA: University of Miami, 2011. 669–674. Print.