3DCD: Scene Independent End-to-End Spatiotemporal Feature Learning Framework for Change Detection in Unseen Videos

IEEE Trans Image Process. 2021:30:546-558. doi: 10.1109/TIP.2020.3037472. Epub 2020 Nov 24.

Abstract

Change detection is an elementary task in computer vision and video processing applications. Recently, a number of supervised methods based on convolutional neural networks have reported high performance over the benchmark dataset. However, their success depends upon the availability of certain proportions of annotated frames from test video during training. Thus, their performance on completely unseen videos or scene independent setup is undocumented in the literature. In this work, we present a scene independent evaluation (SIE) framework to test the supervised methods in completely unseen videos to obtain generalized models for change detection. In addition, a scene dependent evaluation (SDE) is also performed to document the comparative analysis with the existing approaches. We propose a fast (speed-25 fps) and lightweight (0.13 million parameters, model size-1.16 MB) end-to-end 3D-CNN based change detection network (3DCD) with multiple spatiotemporal learning blocks. The proposed 3DCD consists of a gradual reductionist block for background estimation from past temporal history. It also enables motion saliency estimation, multi-schematic feature encoding-decoding, and finally foreground segmentation through several modular blocks. The proposed 3DCD outperforms the existing state-of-the-art approaches evaluated in both SIE and SDE setup over the benchmark CDnet 2014, LASIESTA and SBMI2015 datasets. To the best of our knowledge, this is a first attempt to present results in clearly defined SDE and SIE setups in three change detection datasets.