Advanced search
1 file | 1.29 MB Add to list

Transfer learning by supervised pre-training for audio-based music classification

Author
Organization
Abstract
Very few large-scale music research datasets are publicly available. There is an increasing need for such datasets, because the shift from physical to digital distribution in the music industry has given the listener access to a large body of music, which needs to be cataloged efficiently and be easily browsable. Additionally, deep learning and feature learning techniques are becoming increasingly popular for music information retrieval applications, and they typically require large amounts of training data to work well. In this paper, we propose to exploit an available large-scale music dataset, the Million Song Dataset (MSD), for classification tasks on other datasets, by reusing models trained on the MSD for feature extraction. This transfer learning approach, which we refer to as supervised pre-training, was previously shown to be very effective for computer vision problems. We show that features learned from MSD audio fragments in a supervised manner, using tag labels and user listening data, consistently outperform features learned in an unsupervised manner in this setting, provided that the learned feature extractor is of limited complexity. We evaluate our approach on the GTZAN, 1517-Artists, Unique and Magnatagatune datasets.

Downloads

  • ismir 2014
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 1.29 MB

Citation

Please use this url to cite or link to this publication:

MLA
van den Oord, Aäron, et al. “Transfer Learning by Supervised Pre-Training for Audio-Based Music Classification.” Conference of the International Society for Music Information Retrieval, Proceedings, 2014.
APA
van den Oord, A., Dieleman, S., & Schrauwen, B. (2014). Transfer learning by supervised pre-training for audio-based music classification. Conference of the International Society for Music Information Retrieval, Proceedings. Presented at the Conference of the International Society for Music Information Retrieval (ISMIR 2014), Taipei.
Chicago author-date
Oord, Aäron van den, Sander Dieleman, and Benjamin Schrauwen. 2014. “Transfer Learning by Supervised Pre-Training for Audio-Based Music Classification.” In Conference of the International Society for Music Information Retrieval, Proceedings.
Chicago author-date (all authors)
van den Oord, Aäron, Sander Dieleman, and Benjamin Schrauwen. 2014. “Transfer Learning by Supervised Pre-Training for Audio-Based Music Classification.” In Conference of the International Society for Music Information Retrieval, Proceedings.
Vancouver
1.
van den Oord A, Dieleman S, Schrauwen B. Transfer learning by supervised pre-training for audio-based music classification. In: Conference of the International Society for Music Information Retrieval, Proceedings. 2014.
IEEE
[1]
A. van den Oord, S. Dieleman, and B. Schrauwen, “Transfer learning by supervised pre-training for audio-based music classification,” in Conference of the International Society for Music Information Retrieval, Proceedings, Taipei, 2014.
@inproceedings{5973853,
  abstract     = {{Very few large-scale music research datasets are publicly available. There is an increasing need for such datasets, because the shift from physical to digital distribution in the music industry has given the listener access to a large body of music, which needs to be cataloged efficiently and be easily browsable. Additionally, deep learning and feature learning techniques are becoming increasingly popular for music information retrieval applications, and they typically require large amounts of training data to work well. In this paper, we propose to exploit an available large-scale music dataset, the Million Song Dataset (MSD), for classification tasks on other datasets, by reusing models trained on the MSD for feature extraction. This transfer learning approach, which we refer to as supervised pre-training, was previously shown to be very effective for computer vision problems. We show that features learned from MSD audio fragments in a supervised manner, using tag labels and user listening data, consistently outperform features learned in an unsupervised manner in this setting, provided that the learned feature extractor is of limited complexity. We evaluate our approach on the GTZAN, 1517-Artists, Unique and Magnatagatune datasets.}},
  author       = {{van den Oord, Aäron and Dieleman, Sander and Schrauwen, Benjamin}},
  booktitle    = {{Conference of the International Society for Music Information Retrieval, Proceedings}},
  language     = {{eng}},
  location     = {{Taipei}},
  pages        = {{6}},
  title        = {{Transfer learning by supervised pre-training for audio-based music classification}},
  year         = {{2014}},
}