Advanced search
1 file | 1.36 MB Add to list

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Author
Organization
Abstract
User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance.
Keywords
Text classification, data augmentation, semi-supervised learning, deep neural networks

Downloads

  • SAC2020 - Data Augmentation and Semi-supervised Learning for DNN.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.36 MB

Citation

Please use this url to cite or link to this publication:

MLA
Shim, Heereen, et al. “Data Augmentation and Semi-Supervised Learning for Deep Neural Networks-Based Text Classifier.” Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC’20), Assoc Computing Machinery, 2020, pp. 1119–26, doi:10.1145/3341105.3373992.
APA
Shim, H., Luca, S., Lowet, D., & Vanrumste, B. (2020). Data augmentation and semi-supervised learning for deep neural networks-based text classifier. Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC’20), 1119–1126. https://doi.org/10.1145/3341105.3373992
Chicago author-date
Shim, Heereen, Stijn Luca, Dietwig Lowet, and Bart Vanrumste. 2020. “Data Augmentation and Semi-Supervised Learning for Deep Neural Networks-Based Text Classifier.” In Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC’20), 1119–26. New York: Assoc Computing Machinery. https://doi.org/10.1145/3341105.3373992.
Chicago author-date (all authors)
Shim, Heereen, Stijn Luca, Dietwig Lowet, and Bart Vanrumste. 2020. “Data Augmentation and Semi-Supervised Learning for Deep Neural Networks-Based Text Classifier.” In Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC’20), 1119–1126. New York: Assoc Computing Machinery. doi:10.1145/3341105.3373992.
Vancouver
1.
Shim H, Luca S, Lowet D, Vanrumste B. Data augmentation and semi-supervised learning for deep neural networks-based text classifier. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC’20). New York: Assoc Computing Machinery; 2020. p. 1119–26.
IEEE
[1]
H. Shim, S. Luca, D. Lowet, and B. Vanrumste, “Data augmentation and semi-supervised learning for deep neural networks-based text classifier,” in Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC’20), Czech Tech Univ, Electr Network, 2020, pp. 1119–1126.
@inproceedings{8656299,
  abstract     = {{User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance.}},
  author       = {{Shim, Heereen and Luca, Stijn and Lowet, Dietwig and Vanrumste, Bart}},
  booktitle    = {{Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC'20)}},
  isbn         = {{9781450368667}},
  keywords     = {{Text classification,data augmentation,semi-supervised learning,deep neural networks}},
  language     = {{eng}},
  location     = {{Czech Tech Univ, Electr Network}},
  pages        = {{1119--1126}},
  publisher    = {{Assoc Computing Machinery}},
  title        = {{Data augmentation and semi-supervised learning for deep neural networks-based text classifier}},
  url          = {{http://doi.org/10.1145/3341105.3373992}},
  year         = {{2020}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: