Video summarization using textual descriptions for authoring video blogs


Authoring video blogs requires a video editing process, which is cumbersome for ordinary users. Video summarization can automate this process by extracting important segments from original videos. Because bloggers typically have certain stories for their blog posts, video summaries of a blog post should take the author’s intentions into account. However, most prior works address video summarization by mining patterns from the original videos without considering the blog author’s intentions. To generate a video summary that reflects the blog author’s intention, we focus on supporting texts in video blog posts and present a text-based method, in which the supporting text serves as a prior to the video summary. Given video and text that describe scenes of interest, our method segments videos and assigns to each video segment its priority in the summary based on its relevance to the input text. Our method then selects a subset of segments with content that is similar to the input text. Accordingly, our method produces different video summaries from the same set of videos, depending on the input text. We evaluated summaries generated from both blog viewers’ and authors’ perspectives in a user study. Experimental results demonstrate the advantages to the proposed text-based method for video blog authoring.

Multimedia Tools and Applications
Yuta Nakashima
Yuta Nakashima

Yuta Nakashima is a professor with Institute for Datability Science, Osaka University. His research interests include computer vision, pattern recognition, natural langauge processing, and their applications.