FLGR: Fixed Length Gists Representation Learning for RNN-HMM Hybrid-Based Neuromorphic Continuous Gesture Recognition

Guang Chen; Jieneng Chen; Marten Lienen; Jörg Conradt; Florian Röhrbein; Alois C Knoll

doi:10.3389/fnins.2019.00073

FLGR: Fixed Length Gists Representation Learning for RNN-HMM Hybrid-Based Neuromorphic Continuous Gesture Recognition

Front Neurosci. 2019 Feb 12:13:73. doi: 10.3389/fnins.2019.00073. eCollection 2019.

Authors

Guang Chen^{1

2}, Jieneng Chen³, Marten Lienen², Jörg Conradt⁴, Florian Röhrbein², Alois C Knoll²

Affiliations

¹ College of Automotive Engineering, Tongji University, Shanghai, China.
² Chair of Robotics, Artificial Intelligence and Real-time Systems, Technische Universität München, Munich, Germany.
³ College of Electronics and Information Engineering, Tongji University, Shanghai, China.
⁴ Department of Computational Science and Technology, KTH Royal Institute of Technology, Stockholm, Sweden.

Abstract

A neuromorphic vision sensors is a novel passive sensing modality and frameless sensors with several advantages over conventional cameras. Frame-based cameras have an average frame-rate of 30 fps, causing motion blur when capturing fast motion, e.g., hand gesture. Rather than wastefully sending entire images at a fixed frame rate, neuromorphic vision sensors only transmit the local pixel-level changes induced by the movement in a scene when they occur. This leads to advantageous characteristics, including low energy consumption, high dynamic range, a sparse event stream and low response latency. In this study, a novel representation learning method was proposed: Fixed Length Gists Representation (FLGR) learning for event-based gesture recognition. Previous methods accumulate events into video frames in a time duration (e.g., 30 ms) to make the accumulated image-level representation. However, the accumulated-frame-based representation waives the friendly event-driven paradigm of neuromorphic vision sensor. New representation are urgently needed to fill the gap in non-accumulated-frame-based representation and exploit the further capabilities of neuromorphic vision. The proposed FLGR is a sequence learned from mixture density autoencoder and preserves the nature of event-based data better. FLGR has a data format of fixed length, and it is easy to feed to sequence classifier. Moreover, an RNN-HMM hybrid was proposed to address the continuous gesture recognition problem. Recurrent neural network (RNN) was applied for FLGR sequence classification while hidden Markov model (HMM) is employed for localizing the candidate gesture and improving the result in a continuous sequence. A neuromorphic continuous hand gestures dataset (Neuro ConGD Dataset) was developed with 17 hand gestures classes for the community of the neuromorphic research. Hopefully, FLGR can inspire the study on the event-based highly efficient, high-speed, and high-dynamic-range sequence classification tasks.

Keywords: continuous gesture recognition; hidden markov model; mixture density autoencoder; neuromorphic vision; recurrent neural network; representation learning.