Continuous sign language recognition is a weakly supervised problem to translate video sequence to sign gloss sequence, where temporal boundary of each sign gloss is not annotated. The CNN-RNN-CTC framework shows effectiveness in this task by estimating pseudo label for each clip and retraining the feature extractor alternately. The quality of pseudo labels greatly impacts the final performance. In contrast of existing methods which select labels of maximum posterior probability, we propose a dynamic pseudo label decoding method to find a reasonable alignment path via dynamic-programming. Our approach filters out apparently wrong labels and generates pseudo labels which conform to natural word order of sign language. To further boost the performance after iterative optimization, we introduce a temporal ensemble module equipped with BGRU and 1D-CNN to integrate features from different time scales. Experiments on two continuous sign language benchmarks with large vocabulary show the effectiveness of our proposed method.