Advanced search
1 file | 446.09 KB

Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition

Author
Organization
Abstract
Reservoir Computing (RC) has recently been introduced as an interesting alternative for acoustic modeling. For phone and continuous digit recognition, the reservoir approach obtained quite promising results. In this work, we further elaborate this concept by porting some well-known techniques used to enhance recognition rates of GMM-based models to Reservoir Computing. In particular, we introduce context-dependent (CD) triphone states to model co-articulation and pronunciation mismatches arising from an imperfect lexicon. We also propose to incorporate two speaker normalization methods in the feature space, namely mean \& variance normalization and vocal tract length normalization. The impact of the investigated techniques is studied in the context of phone recognition on the TIMIT corpus. Our CD-RC-HMM hybrid yields a speaker-independent phone error rate (PER) of 22\% and a speaker-dependent PER of 20.5\%. By combining GMM and RC-based likelihoods at the state level, these scores can be reduced further.

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 446.09 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Triefenbach, Fabian, Azarakhsh Jalalvand, Kris Demuynck, and Jean-Pierre Martens. 2013. “Context-dependent Modeling and Speaker Normalization Applied to Reservoir-based Phone Recognition.” In 14th Annual Conference of the International Speech Communication Association, Proceedings, 3342–3346.
APA
Triefenbach, F., Jalalvand, A., Demuynck, K., & Martens, J.-P. (2013). Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition. 14th Annual conference of the International Speech Communication Association, Proceedings (pp. 3342–3346). Presented at the 14th Annual conference of the International Speech Communication Association.
Vancouver
1.
Triefenbach F, Jalalvand A, Demuynck K, Martens J-P. Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition. 14th Annual conference of the International Speech Communication Association, Proceedings. 2013. p. 3342–6.
MLA
Triefenbach, Fabian, Azarakhsh Jalalvand, Kris Demuynck, et al. “Context-dependent Modeling and Speaker Normalization Applied to Reservoir-based Phone Recognition.” 14th Annual Conference of the International Speech Communication Association, Proceedings. 2013. 3342–3346. Print.
@inproceedings{3237634,
  abstract     = {Reservoir Computing (RC) has recently been introduced as an interesting alternative for acoustic modeling. For phone and continuous digit recognition, the reservoir approach obtained quite promising results. In this work, we further elaborate this concept by porting some well-known techniques used to enhance recognition rates of GMM-based models to Reservoir Computing. In particular, we introduce context-dependent (CD) triphone states to model co-articulation and pronunciation mismatches arising from an imperfect lexicon. We also propose to incorporate two speaker normalization methods in the feature space, namely mean {\textbackslash}\& variance normalization and vocal tract length normalization. The impact of the investigated techniques is studied in the context of phone recognition on the TIMIT corpus. Our CD-RC-HMM hybrid yields a speaker-independent phone error rate (PER) of 22{\textbackslash}\% and a speaker-dependent PER of 20.5{\textbackslash}\%. By combining GMM and RC-based likelihoods at the state level, these scores can be reduced further.},
  author       = {Triefenbach, Fabian and Jalalvand, Azarakhsh and Demuynck, Kris and Martens, Jean-Pierre},
  booktitle    = {14th Annual conference of the International Speech Communication Association, Proceedings},
  isbn         = {9781629934433},
  issn         = {2308-457X},
  language     = {eng},
  location     = {Lyon, France},
  pages        = {3342--3346},
  title        = {Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition},
  year         = {2013},
}