A parallel spatiotemporal saliency and discriminative online learning method for visual target tracking in aerial videos

PLoS One. 2018 Feb 13;13(2):e0192246. doi: 10.1371/journal.pone.0192246. eCollection 2018.

Abstract

Visual tracking in aerial videos is a challenging task in computer vision and remote sensing technologies due to appearance variation difficulties. Appearance variations are caused by camera and target motion, low resolution noisy images, scale changes, and pose variations. Various approaches have been proposed to deal with appearance variation difficulties in aerial videos, and amongst these methods, the spatiotemporal saliency detection approach reported promising results in the context of moving target detection. However, it is not accurate for moving target detection when visual tracking is performed under appearance variations. In this study, a visual tracking method is proposed based on spatiotemporal saliency and discriminative online learning methods to deal with appearance variations difficulties. Temporal saliency is used to represent moving target regions, and it was extracted based on the frame difference with Sauvola local adaptive thresholding algorithms. The spatial saliency is used to represent the target appearance details in candidate moving regions. SLIC superpixel segmentation, color, and moment features can be used to compute feature uniqueness and spatial compactness of saliency measurements to detect spatial saliency. It is a time consuming process, which prompted the development of a parallel algorithm to optimize and distribute the saliency detection processes that are loaded into the multi-processors. Spatiotemporal saliency is then obtained by combining the temporal and spatial saliencies to represent moving targets. Finally, a discriminative online learning algorithm was applied to generate a sample model based on spatiotemporal saliency. This sample model is then incrementally updated to detect the target in appearance variation conditions. Experiments conducted on the VIVID dataset demonstrated that the proposed visual tracking method is effective and is computationally efficient compared to state-of-the-art methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Learning*
  • Online Systems*
  • Videotape Recording*
  • Vision, Ocular*

Grants and funding

We would also like to acknowledge Universiti Kebangsaan Malaysia (UKM) financial support for this project via DIP-2014-037.