PWStableNet: Learning Pixel-wise Warping Maps for Video Stabilization

IEEE Trans Image Process. 2020 Jan 7. doi: 10.1109/TIP.2019.2963380. Online ahead of print.

Abstract

As the videos captured by hand-held cameras are often perturbed by high-frequency jitters, stabilization of these videos is an essential task. Many video stabilization methods have been proposed to stabilize shaky videos. However, most methods estimate one global homography or several homographies based on fixed meshes to warp the shaky frames into their stabilized views. Due to the existence of parallax, such single or a few homographies can not well handle the depth variation. In contrast to these traditional methods, we propose a novel video stabilization network, called PWStableNet, which comes up pixel-wise warping maps, i.e., potentially different warping for different pixels, and stabilizes each pixel to its stabilized view. To our best knowledge, this is the first deep learning based pixel-wise video stabilization. The proposed method is built upon a multi-stage cascade encoder-decoder architecture and learns pixel-wise warping maps from consecutive unstable frames. Inter-stage connections are also introduced to add feature maps of a former stage to the corresponding feature maps at a latter stage, which enables the latter stage to learn the residual from the feature maps of former stages. This cascade architecture can produce more precise warping maps at latter stages. To ensure the correct learning of pixel-wise warping maps, we use a well-designed loss function to guide the training procedure of the proposed PWStableNet. The proposed stabilization method achieves comparable performance with traditional methods, but stronger robustness and much faster processing speed. Moreover, the proposed stabilization method outperforms some typical CNN-based stabilization methods, especially in videos with strong parallax. Codes will be provided at https://github.com/mindazhao/pix-pix-warping-video-stabilization.