
Acoustic modeling with hierarchical reservoirs
- Author
- Fabian Triefenbach (UGent) , Azarakhsh Jalalvand (UGent) , Kris Demuynck (UGent) and Jean-Pierre Martens (UGent)
- Organization
- Abstract
- Accurate acoustic modeling is an essential requirement of a state-of-the-art continuous speech recognizer. The Acoustic Model (AM) describes the relation between the observed speech signal and the non-observable sequence of phonetic units uttered by the speaker. Nowadays, most recognizers use Hidden Markov Models (HMMs) in combination with Gaussian Mixture Models (GMMs) to model the acoustics, but neural-based architectures are on the rise again. In this work, the recently introduced Reservoir Computing (RC) paradigm is used for acoustic modeling. A reservoir is a fixed – and thus non-trained – Recurrent Neural Network (RNN) that is combined with a trained linear model. This approach combines the ability of an RNN to model the recent past of the input sequence with a simple and reliable training procedure. It is shown here that simple reservoir-based AMs achieve reasonable phone recognition and that deep hierarchical and bi-directional reservoir architectures lead to a very competitive Phone Error Rate (PER) of 23.1% on the well-known TIMIT task.
- Keywords
- reservoir computing, recurrent neural networks, automatic speech recognition, Acoustic modeling, TIME, NETS, NEURONS, REPRESENTATIONS, PHONEME RECOGNITION, NEURAL-NETWORKS, ECHO STATE NETWORKS, HIDDEN MARKOV-MODELS, AUTOMATIC SPEECH RECOGNITION
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 1.78 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-3237631
- MLA
- Triefenbach, Fabian, et al. “Acoustic Modeling with Hierarchical Reservoirs.” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 21, no. 11, 2013, pp. 2439–50, doi:10.1109/TASL.2013.2280209.
- APA
- Triefenbach, F., Jalalvand, A., Demuynck, K., & Martens, J.-P. (2013). Acoustic modeling with hierarchical reservoirs. IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 21(11), 2439–2450. https://doi.org/10.1109/TASL.2013.2280209
- Chicago author-date
- Triefenbach, Fabian, Azarakhsh Jalalvand, Kris Demuynck, and Jean-Pierre Martens. 2013. “Acoustic Modeling with Hierarchical Reservoirs.” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 21 (11): 2439–50. https://doi.org/10.1109/TASL.2013.2280209.
- Chicago author-date (all authors)
- Triefenbach, Fabian, Azarakhsh Jalalvand, Kris Demuynck, and Jean-Pierre Martens. 2013. “Acoustic Modeling with Hierarchical Reservoirs.” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 21 (11): 2439–2450. doi:10.1109/TASL.2013.2280209.
- Vancouver
- 1.Triefenbach F, Jalalvand A, Demuynck K, Martens J-P. Acoustic modeling with hierarchical reservoirs. IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING. 2013;21(11):2439–50.
- IEEE
- [1]F. Triefenbach, A. Jalalvand, K. Demuynck, and J.-P. Martens, “Acoustic modeling with hierarchical reservoirs,” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 21, no. 11, pp. 2439–2450, 2013.
@article{3237631, abstract = {{Accurate acoustic modeling is an essential requirement of a state-of-the-art continuous speech recognizer. The Acoustic Model (AM) describes the relation between the observed speech signal and the non-observable sequence of phonetic units uttered by the speaker. Nowadays, most recognizers use Hidden Markov Models (HMMs) in combination with Gaussian Mixture Models (GMMs) to model the acoustics, but neural-based architectures are on the rise again. In this work, the recently introduced Reservoir Computing (RC) paradigm is used for acoustic modeling. A reservoir is a fixed – and thus non-trained – Recurrent Neural Network (RNN) that is combined with a trained linear model. This approach combines the ability of an RNN to model the recent past of the input sequence with a simple and reliable training procedure. It is shown here that simple reservoir-based AMs achieve reasonable phone recognition and that deep hierarchical and bi-directional reservoir architectures lead to a very competitive Phone Error Rate (PER) of 23.1% on the well-known TIMIT task.}}, author = {{Triefenbach, Fabian and Jalalvand, Azarakhsh and Demuynck, Kris and Martens, Jean-Pierre}}, issn = {{1558-7916}}, journal = {{IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING}}, keywords = {{reservoir computing,recurrent neural networks,automatic speech recognition,Acoustic modeling,TIME,NETS,NEURONS,REPRESENTATIONS,PHONEME RECOGNITION,NEURAL-NETWORKS,ECHO STATE NETWORKS,HIDDEN MARKOV-MODELS,AUTOMATIC SPEECH RECOGNITION}}, language = {{eng}}, number = {{11}}, pages = {{2439--2450}}, title = {{Acoustic modeling with hierarchical reservoirs}}, url = {{http://doi.org/10.1109/TASL.2013.2280209}}, volume = {{21}}, year = {{2013}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: