Continuous sign language recognition via reinforcement learning

Abstract

In this paper, we propose an approach to apply the Transformer with reinforcement learning (RL) for continuous sign language recognition (CSLR) task. The Transformer has an encoder-decoder structure, where the encoder network encodes the sign video into the context vector representation, while the decoder network generates the target sentence word by word based on the context vector. To avoid the intrinsic defects of supervised learning (SL) in our task, e.g., the exposure bias and non-differentiable task metrics issues, we propose to train the Transformer directly on non-differentiable metrics, i.e., word error rate (WER), through RL. Moreover, a policy gradient algorithm with baseline, which we call Self-critic REINFORCE, is employed to reduce variance while training. Experimental results on RWTH-PHOENIX- Weather benchmark verify the effectiveness of our method and demonstrate that our method achieves the comparable performance.

Publication
In International Conference on Image Processing