Advanced search
1 file | 379.23 KB

Audio-based music classification with a pretrained convolutional network

Sander Dieleman (UGent) , Philémon Brakel (UGent) and Benjamin Schrauwen (UGent)
Author
Organization
Abstract
Recently the ‘Million Song Dataset’, containing audio features and metadata for one million songs, was made available. In this paper, we build a convolutional network that is then trained to perform artist recognition, genre recognition and key detection. The network is tailored to summarize the audio features over musically significant timescales. It is infeasible to train the network on all available data in a supervised fashion, so we use unsupervised pretraining to be able to harness the entire dataset: we train a convolutional deep belief network on all data, and then use the learnt parameters to initialize a convolutional multilayer perceptron with the same architecture. The MLP is then trained on a labeled subset of the data for each task. We also train the same MLP with randomly initialized weights. We find that our convolutional approach improves accuracy for the genre recognition and artist recognition tasks. Unsupervised pretraining improves convergence speed in all cases. For artist recognition it improves accuracy as well.

Downloads

  • PS6-3.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 379.23 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Dieleman, Sander, Philémon Brakel, and Benjamin Schrauwen. 2011. “Audio-based Music Classification with a Pretrained Convolutional Network.” In Proceedings of the 12th International Society for Music Information Retrieval Conference : Proc. ISMIR 2011, ed. Anssi Klapuri and Colby Leider, 669–674. Miami, FL, USA: University of Miami.
APA
Dieleman, S., Brakel, P., & Schrauwen, B. (2011). Audio-based music classification with a pretrained convolutional network. In A. Klapuri & C. Leider (Eds.), Proceedings of the 12th international society for music information retrieval conference : Proc. ISMIR 2011 (pp. 669–674). Presented at the 12th International Society for Music Information Retrieval Conference (ISMIR - 2011), Miami, FL, USA: University of Miami.
Vancouver
1.
Dieleman S, Brakel P, Schrauwen B. Audio-based music classification with a pretrained convolutional network. In: Klapuri A, Leider C, editors. Proceedings of the 12th international society for music information retrieval conference : Proc. ISMIR 2011. Miami, FL, USA: University of Miami; 2011. p. 669–74.
MLA
Dieleman, Sander, Philémon Brakel, and Benjamin Schrauwen. “Audio-based Music Classification with a Pretrained Convolutional Network.” Proceedings of the 12th International Society for Music Information Retrieval Conference : Proc. ISMIR 2011. Ed. Anssi Klapuri & Colby Leider. Miami, FL, USA: University of Miami, 2011. 669–674. Print.
@inproceedings{1989534,
  abstract     = {Recently the {\textquoteleft}Million Song Dataset{\textquoteright}, containing audio features and metadata for one million songs, was made available. In this paper, we build a convolutional network that is then trained to perform artist recognition, genre recognition and key detection. The network is tailored to summarize the audio features over musically significant timescales. It is infeasible to train the network on all available data in a supervised fashion, so we use unsupervised pretraining to be able to harness the entire dataset: we train a convolutional deep belief network on all data, and then use the learnt parameters to initialize a convolutional multilayer perceptron with the same architecture. The MLP is then trained on a labeled subset of the data for each task. We also train the same MLP with randomly initialized weights. We find that our convolutional approach improves accuracy for the genre recognition and artist recognition tasks. Unsupervised pretraining improves convergence speed in all cases. For artist recognition it improves accuracy as well.},
  author       = {Dieleman, Sander and Brakel, Phil{\'e}mon and Schrauwen, Benjamin},
  booktitle    = {Proceedings of the 12th international society for music information retrieval conference : Proc. ISMIR 2011},
  editor       = {Klapuri, Anssi and Leider, Colby},
  isbn         = {9780615548654},
  language     = {eng},
  location     = {Miami, FL, USA},
  pages        = {669--674},
  publisher    = {University of Miami},
  title        = {Audio-based music classification with a pretrained convolutional network},
  year         = {2011},
}