
Dual Rectified Linear Units (DReLUs) : a replacement for tanh activation functions in quasi-recurrent neural networks
- Author
- Fréderic Godin, Jonas Degrave (UGent) , Joni Dambre (UGent) and Wesley De Neve (UGent)
- Organization
- Abstract
- In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a dropin replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) [1]. Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. [1] and compare our DReLUbased QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs. (C) 2018 Elsevier B.V. All rights reserved.
- Keywords
- Activation functions, ReLU, Dual Rectified Linear Unit, Recurrent Neural, Networks, Language modeling
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 489.16 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8584126
- MLA
- Godin, Fréderic, et al. “Dual Rectified Linear Units (DReLUs) : A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks.” PATTERN RECOGNITION LETTERS, vol. 116, Elsevier Science Bv, 2018, pp. 8–14, doi:10.1016/j.patrec.2018.09.006.
- APA
- Godin, F., Degrave, J., Dambre, J., & De Neve, W. (2018). Dual Rectified Linear Units (DReLUs) : a replacement for tanh activation functions in quasi-recurrent neural networks. PATTERN RECOGNITION LETTERS, 116, 8–14. https://doi.org/10.1016/j.patrec.2018.09.006
- Chicago author-date
- Godin, Fréderic, Jonas Degrave, Joni Dambre, and Wesley De Neve. 2018. “Dual Rectified Linear Units (DReLUs) : A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks.” PATTERN RECOGNITION LETTERS 116: 8–14. https://doi.org/10.1016/j.patrec.2018.09.006.
- Chicago author-date (all authors)
- Godin, Fréderic, Jonas Degrave, Joni Dambre, and Wesley De Neve. 2018. “Dual Rectified Linear Units (DReLUs) : A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks.” PATTERN RECOGNITION LETTERS 116: 8–14. doi:10.1016/j.patrec.2018.09.006.
- Vancouver
- 1.Godin F, Degrave J, Dambre J, De Neve W. Dual Rectified Linear Units (DReLUs) : a replacement for tanh activation functions in quasi-recurrent neural networks. PATTERN RECOGNITION LETTERS. 2018;116:8–14.
- IEEE
- [1]F. Godin, J. Degrave, J. Dambre, and W. De Neve, “Dual Rectified Linear Units (DReLUs) : a replacement for tanh activation functions in quasi-recurrent neural networks,” PATTERN RECOGNITION LETTERS, vol. 116, pp. 8–14, 2018.
@article{8584126, abstract = {{In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a dropin replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) [1]. Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. [1] and compare our DReLUbased QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs. (C) 2018 Elsevier B.V. All rights reserved.}}, author = {{Godin, Fréderic and Degrave, Jonas and Dambre, Joni and De Neve, Wesley}}, issn = {{0167-8655}}, journal = {{PATTERN RECOGNITION LETTERS}}, keywords = {{Activation functions,ReLU,Dual Rectified Linear Unit,Recurrent Neural,Networks,Language modeling}}, language = {{eng}}, pages = {{8--14}}, publisher = {{Elsevier Science Bv}}, title = {{Dual Rectified Linear Units (DReLUs) : a replacement for tanh activation functions in quasi-recurrent neural networks}}, url = {{http://doi.org/10.1016/j.patrec.2018.09.006}}, volume = {{116}}, year = {{2018}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: