Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
VIDEO
SUMMARIZATION
By
1
CERTIFICATE
This is to certify that Project Report entitled “Video Summarization” which is submitted
by Gaurav Kr. Yadav, Vaibhav Sinha, Kamini and Nishant Pratap Singh in partial
fulfilment of the requirement for the award of degree B. Tech. in Department of
Information Technology of Dr. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY, is a
record of the candidate's own work carried out by him under my/our supervision. The
matter embodied in this thesis is original and has not been submitted for the award of
any other degree.
Date:
2
ACKNOWLEDGEMENT
It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken
during B.Tech. Final Year. We owe a special debt of gratitude to Ms. Rosey Chauhan,
Department of Information Technology, JSS Academy of Technical Education, Noida for
her constant support and guidance throughout the course of our work.
Her sincerity, thoroughness and perseverance have been a constant source of inspiration for
us. It is only her cognizant effort that our endeavours have seen the light of the day.
3
TABLE OF CONTENTS
CERTIFICATE ..................................................................................................... ii
ACKNOWLEDGEMENT .............................................................................................. iii
1) Introduction................................................ 5
2) Motivation.................................................. 5
3) Objective...............................................6
4) Scope....................................................7
5) Related Work........................................8-9
6) Technical Feasibility..............................10
7) References............................................11
4
1.INTRODUCTION
The basic unit of videos is a frame, which is a 2D collection of RGB colors. These frames when
viewed at a rate beyond cutoff(Typical frame rate for videos is 30 frames per second), would
generate an effect of continuity. Video summarization aims at creating a summary with
maximum amount of information with least amount of frames, which essentially means filtering
the critical frames (or sequences of frames) from the original video.
2.MOTIVATION
Rapid growth in digital media has led to an outburst of videos. Digital content, especially videos,
have become easier to generate than consume. This warrants the need for automatic video
summarization. The video summarization models can be used to develop trailers for movies and
TV Shows, generate highlights for sports matches or extract important events from surveillance
data.
5
3.OBJECTIVE
1. Shortening videos to the most relevant subsequences, allowing humans to browse large
repository of videos efficiently
2. Enhancing the information retreival from the videos by extracting important events from
surveillance data.
6
4.SCOPE
4.1. Boundaries
The Video Summarization process are typically treated as an unsupervised learning problem.
However, some recent work models video summarization as a supervised learning problem.
Typical hierarchy for the videos is scenes, then shots, and then frames. This process is highly
helpful in filtering out the critical frames and thus enhancing the humans ability to browse large
repositories efficiently. Since in recent times the videographic content has increased
exponentially, this process is of outmost importance and needs to be refined even further.
4.2. Functionalities
The entire system is monitored and controlled by using OpenCV, Python3, TensorFlow. The
main components of the system are image and video processing and the algorithms that are used
in the system. Preparing of the video Dataset, extracting the frames using video processing
techniques of OpenCV, converting the RGB image to grayscale mode is the foremost task.
_ Model-Based Summarization
_ Time-Compression based
_ Text and speech recognition based
Effect of dimensionality reduction techinques such as PCA and SVD, and various transforms
such as Discrete Haar Wavelet transforms, SIFT (Scale Invariant Feature Transform) are also
proposed to be studied, wherever relevant. Supervised video summarization may also be
explored.
7
5.RELATED WORK
Video Summarization has gained a lot of attention in recent times. The task of video
summarization can be accomplished in primarily two ways: Key Frame Extraction and Video
Skimming. A lot of the initial work in video summarization has been done in the domain of
unsupervised learning, but the recent trend has shifted towards learning from how humans
generate video summaries, leading it into the supervised domains. A brief review of the work
done in the video summarization is done according to the broad categorization above.
Selecting the important shots/frames of a video is done using both unsupervised and supervised
8
methods. Unsupervised algorithms like [1] use manually defined factors for comparing frames
and then subsequently choosing the key ones. The evaluation metrics are generally based on the
intuition of how the frames should be different from each other and should be placed far apart in
a feature space. The intuition in this approach is to find features of interest in frames, clustering
frames that have similar features and finally choosing parts of the video that are highly
dissimilar. With increasing availability of datasets with available human extracted summaries,
there has been some promising recent work [1] in the domain of supervised learning. Supervised
algorithms require training examples to train the model to learn which parts of a video are
important. The general requirement for supervised learning algorithms is that they require
annotated data which is available in the form of level of interest that a frame has, or binary data
indicating if a frame is to be included in the summary or not. It is important, however, in all
methods to take into account the sequential nature of a video and how humans perceive a video.
The results of supervised learning are promising, although, still not comparably different from
some of the unsupervised methods.
9
6.TECHNICAL FEASIBILITY
1. OpenCV
(Open Source Computer Vision) is a library of programming functions mainly aimed at real-
time computer vision. Library aimed at image and Video processing, mainly aimed at using the
operations related to images and videos.
2.Numpy
3. Imageio
Imageio is a Python library that provides an easy interface to read and write a wide range of
image data, including animated images, video, volumetric data, and scientific formats. It is
cross-platform, runs on Python 2.7 and 3.4+.
4.Scikit-learn
Scikit-learn is a free software machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to
interoperate with the Python numerical and scientific libraries NumPy and SciPy
5.Matplotlib
10
7. REFERENCES
2. Diverse Sequential Subset Selection for Supervised Video Summarization - Boqing Gong,
Wei-Lun Chao, Kris- ten Grauman, Fei Sha
11