Advanced search
1 file | 1.78 MB Add to list

Acoustic modeling with hierarchical reservoirs

Author
Organization
Abstract
Accurate acoustic modeling is an essential requirement of a state-of-the-art continuous speech recognizer. The Acoustic Model (AM) describes the relation between the observed speech signal and the non-observable sequence of phonetic units uttered by the speaker. Nowadays, most recognizers use Hidden Markov Models (HMMs) in combination with Gaussian Mixture Models (GMMs) to model the acoustics, but neural-based architectures are on the rise again. In this work, the recently introduced Reservoir Computing (RC) paradigm is used for acoustic modeling. A reservoir is a fixed – and thus non-trained – Recurrent Neural Network (RNN) that is combined with a trained linear model. This approach combines the ability of an RNN to model the recent past of the input sequence with a simple and reliable training procedure. It is shown here that simple reservoir-based AMs achieve reasonable phone recognition and that deep hierarchical and bi-directional reservoir architectures lead to a very competitive Phone Error Rate (PER) of 23.1% on the well-known TIMIT task.
Keywords
reservoir computing, recurrent neural networks, automatic speech recognition, Acoustic modeling, TIME, NETS, NEURONS, REPRESENTATIONS, PHONEME RECOGNITION, NEURAL-NETWORKS, ECHO STATE NETWORKS, HIDDEN MARKOV-MODELS, AUTOMATIC SPEECH RECOGNITION

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.78 MB

Citation

Please use this url to cite or link to this publication:

MLA
Triefenbach, Fabian, et al. “Acoustic Modeling with Hierarchical Reservoirs.” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 21, no. 11, 2013, pp. 2439–50, doi:10.1109/TASL.2013.2280209.
APA
Triefenbach, F., Jalalvand, A., Demuynck, K., & Martens, J.-P. (2013). Acoustic modeling with hierarchical reservoirs. IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 21(11), 2439–2450. https://doi.org/10.1109/TASL.2013.2280209
Chicago author-date
Triefenbach, Fabian, Azarakhsh Jalalvand, Kris Demuynck, and Jean-Pierre Martens. 2013. “Acoustic Modeling with Hierarchical Reservoirs.” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 21 (11): 2439–50. https://doi.org/10.1109/TASL.2013.2280209.
Chicago author-date (all authors)
Triefenbach, Fabian, Azarakhsh Jalalvand, Kris Demuynck, and Jean-Pierre Martens. 2013. “Acoustic Modeling with Hierarchical Reservoirs.” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 21 (11): 2439–2450. doi:10.1109/TASL.2013.2280209.
Vancouver
1.
Triefenbach F, Jalalvand A, Demuynck K, Martens J-P. Acoustic modeling with hierarchical reservoirs. IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING. 2013;21(11):2439–50.
IEEE
[1]
F. Triefenbach, A. Jalalvand, K. Demuynck, and J.-P. Martens, “Acoustic modeling with hierarchical reservoirs,” IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 21, no. 11, pp. 2439–2450, 2013.
@article{3237631,
  abstract     = {{Accurate acoustic modeling is an essential requirement of a state-of-the-art continuous speech recognizer. The Acoustic Model (AM) describes the relation between the observed speech signal and the non-observable sequence of phonetic units uttered by the speaker. Nowadays, most recognizers use Hidden Markov Models (HMMs) in combination with Gaussian Mixture Models (GMMs) to model the acoustics, but neural-based architectures are on the rise again. In this work, the recently introduced Reservoir Computing (RC) paradigm is used for acoustic modeling. A reservoir is a fixed – and thus non-trained – Recurrent Neural Network (RNN) that is combined with a trained linear model. This approach combines the ability of an RNN to model the recent past of the input sequence with a simple and reliable training procedure. It is shown here that simple reservoir-based AMs achieve reasonable phone recognition and that deep hierarchical and bi-directional reservoir architectures lead to a very competitive Phone Error Rate (PER) of 23.1% on the well-known TIMIT task.}},
  author       = {{Triefenbach, Fabian and Jalalvand, Azarakhsh and Demuynck, Kris and Martens, Jean-Pierre}},
  issn         = {{1558-7916}},
  journal      = {{IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING}},
  keywords     = {{reservoir computing,recurrent neural networks,automatic speech recognition,Acoustic modeling,TIME,NETS,NEURONS,REPRESENTATIONS,PHONEME RECOGNITION,NEURAL-NETWORKS,ECHO STATE NETWORKS,HIDDEN MARKOV-MODELS,AUTOMATIC SPEECH RECOGNITION}},
  language     = {{eng}},
  number       = {{11}},
  pages        = {{2439--2450}},
  title        = {{Acoustic modeling with hierarchical reservoirs}},
  url          = {{http://doi.org/10.1109/TASL.2013.2280209}},
  volume       = {{21}},
  year         = {{2013}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: