Key Frame Extraction in the Summary Space

IEEE Trans Cybern. 2018 Jun;48(6):1923-1934. doi: 10.1109/TCYB.2017.2718579. Epub 2017 Jul 4.

Abstract

Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.