Advanced search
1 file | 5.84 MB Add to list

Deep dynamic neural networks for multimodal gesture segmentation and recognition

Author
Organization
Abstract
This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.
Keywords
hidden Markov models, gesture recognition, deep belief networks, convolutional neural networks, Deep learning

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 5.84 MB

Citation

Please use this url to cite or link to this publication:

MLA
Wu, Di, et al. “Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition.” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 38, no. 8, IEEE, 2016, pp. 1583–97, doi:10.1109/TPAMI.2016.2537340.
APA
Wu, D., Pigou, L., Kindermans, P.-J., LE, N., Shao, L., Dambre, J., & Odobez, J.-M. (2016). Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 38(8), 1583–1597. https://doi.org/10.1109/TPAMI.2016.2537340
Chicago author-date
Wu, Di, Lionel Pigou, Pieter-Jan Kindermans, Nam LE, Ling Shao, Joni Dambre, and Jean-Marc Odobez. 2016. “Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition.” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 38 (8): 1583–97. https://doi.org/10.1109/TPAMI.2016.2537340.
Chicago author-date (all authors)
Wu, Di, Lionel Pigou, Pieter-Jan Kindermans, Nam LE, Ling Shao, Joni Dambre, and Jean-Marc Odobez. 2016. “Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition.” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 38 (8): 1583–1597. doi:10.1109/TPAMI.2016.2537340.
Vancouver
1.
Wu D, Pigou L, Kindermans P-J, LE N, Shao L, Dambre J, et al. Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. 2016;38(8):1583–97.
IEEE
[1]
D. Wu et al., “Deep dynamic neural networks for multimodal gesture segmentation and recognition,” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 38, no. 8, pp. 1583–1597, 2016.
@article{7223133,
  abstract     = {{This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.}},
  author       = {{Wu, Di and Pigou, Lionel and Kindermans, Pieter-Jan and LE, Nam and Shao, Ling and Dambre, Joni and Odobez, Jean-Marc}},
  issn         = {{0162-8828}},
  journal      = {{IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE}},
  keywords     = {{hidden Markov models,gesture recognition,deep belief networks,convolutional neural networks,Deep learning}},
  language     = {{eng}},
  number       = {{8}},
  pages        = {{1583--1597}},
  publisher    = {{IEEE}},
  title        = {{Deep dynamic neural networks for multimodal gesture segmentation and recognition}},
  url          = {{http://dx.doi.org/10.1109/TPAMI.2016.2537340}},
  volume       = {{38}},
  year         = {{2016}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: