Advanced search
1 file | 489.16 KB

Dual Rectified Linear Units (DReLUs): A replacement for tanh activation functions in Quasi-Recurrent Neural Networks

Fréderic Godin (UGent) , Jonas Degrave (UGent) , Joni Dambre (UGent) and Wesley De Neve (UGent)
Author
Organization
Abstract
In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a dropin replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) [1]. Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. [1] and compare our DReLUbased QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs. (C) 2018 Elsevier B.V. All rights reserved.
Keywords
Activation functions, ReLU, Dual Rectified Linear Unit, Recurrent Neural, Networks, Language modeling

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 489.16 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Godin, Fréderic, Jonas Degrave, Joni Dambre, and Wesley De Neve. 2018. “Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks.” Pattern Recognition Letters 116: 8–14.
APA
Godin, F., Degrave, J., Dambre, J., & De Neve, W. (2018). Dual Rectified Linear Units (DReLUs): A replacement for tanh activation functions in Quasi-Recurrent Neural Networks. PATTERN RECOGNITION LETTERS, 116, 8–14.
Vancouver
1.
Godin F, Degrave J, Dambre J, De Neve W. Dual Rectified Linear Units (DReLUs): A replacement for tanh activation functions in Quasi-Recurrent Neural Networks. PATTERN RECOGNITION LETTERS. Amsterdam: Elsevier Science Bv; 2018;116:8–14.
MLA
Godin, Fréderic, Jonas Degrave, Joni Dambre, et al. “Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks.” PATTERN RECOGNITION LETTERS 116 (2018): 8–14. Print.
@article{8584126,
  abstract     = {In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a dropin replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) [1]. Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. [1] and compare our DReLUbased QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs. (C) 2018 Elsevier B.V. All rights reserved.},
  author       = {Godin, Fr{\'e}deric and Degrave, Jonas and Dambre, Joni and De Neve, Wesley},
  issn         = {0167-8655},
  journal      = {PATTERN RECOGNITION LETTERS},
  keyword      = {Activation functions,ReLU,Dual Rectified Linear Unit,Recurrent Neural,Networks,Language modeling},
  language     = {eng},
  pages        = {8--14},
  publisher    = {Elsevier Science Bv},
  title        = {Dual Rectified Linear Units (DReLUs): A replacement for tanh activation functions in Quasi-Recurrent Neural Networks},
  url          = {http://dx.doi.org/10.1016/j.patrec.2018.09.006},
  volume       = {116},
  year         = {2018},
}

Altmetric
View in Altmetric
Web of Science
Times cited: