Sei sulla pagina 1di 11

Synopsis on

VIDEO
SUMMARIZATION

By

Gaurav Kr. Yadav (1609113041)


Vaibhav Sinha(1609113122)
Kamini (1609113052)
Nishant Pratap Singh(1609113068)

Under the supervision of


Ms. ROSEY CHAUHAN

1
CERTIFICATE

This is to certify that Project Report entitled “Video Summarization” which is submitted
by Gaurav Kr. Yadav, Vaibhav Sinha, Kamini and Nishant Pratap Singh in partial
fulfilment of the requirement for the award of degree B. Tech. in Department of
Information Technology of Dr. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY, is a
record of the candidate's own work carried out by him under my/our supervision. The
matter embodied in this thesis is original and has not been submitted for the award of
any other degree.

Supervisor: Ms. Rosey Chauhan

Date:

2
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken
during B.Tech. Final Year. We owe a special debt of gratitude to Ms. Rosey Chauhan,
Department of Information Technology, JSS Academy of Technical Education, Noida for
her constant support and guidance throughout the course of our work.

Her sincerity, thoroughness and perseverance have been a constant source of inspiration for
us. It is only her cognizant effort that our endeavours have seen the light of the day.

We also take the opportunity to acknowledge the contribution of

Dr. Vineeta Khemchandani, Head of Department of Information Technology, JSS


Academy of Technical Education, Noida for her full support and assistance during the
development of the project.

3
TABLE OF CONTENTS

CERTIFICATE ..................................................................................................... ii
ACKNOWLEDGEMENT .............................................................................................. iii
1) Introduction................................................ 5
2) Motivation.................................................. 5
3) Objective...............................................6
4) Scope....................................................7
5) Related Work........................................8-9
6) Technical Feasibility..............................10
7) References............................................11

4
1.INTRODUCTION

The basic unit of videos is a frame, which is a 2D collection of RGB colors. These frames when
viewed at a rate beyond cutoff(Typical frame rate for videos is 30 frames per second), would
generate an effect of continuity. Video summarization aims at creating a summary with
maximum amount of information with least amount of frames, which essentially means filtering
the critical frames (or sequences of frames) from the original video.

2.MOTIVATION

Rapid growth in digital media has led to an outburst of videos. Digital content, especially videos,
have become easier to generate than consume. This warrants the need for automatic video
summarization. The video summarization models can be used to develop trailers for movies and
TV Shows, generate highlights for sports matches or extract important events from surveillance
data.

5
3.OBJECTIVE

1. Shortening videos to the most relevant subsequences, allowing humans to browse large
repository of videos efficiently

2. Enhancing the information retreival from the videos by extracting important events from
surveillance data.

6
4.SCOPE

4.1. Boundaries

The Video Summarization process are typically treated as an unsupervised learning problem.
However, some recent work models video summarization as a supervised learning problem.
Typical hierarchy for the videos is scenes, then shots, and then frames. This process is highly
helpful in filtering out the critical frames and thus enhancing the humans ability to browse large
repositories efficiently. Since in recent times the videographic content has increased
exponentially, this process is of outmost importance and needs to be refined even further.

4.2. Functionalities

The entire system is monitored and controlled by using OpenCV, Python3, TensorFlow. The
main components of the system are image and video processing and the algorithms that are used
in the system. Preparing of the video Dataset, extracting the frames using video processing
techniques of OpenCV, converting the RGB image to grayscale mode is the foremost task.

Models and Techniques proposed:


Video Summarizaration is typically treated as an unsupervised learning problem. However, some
recent work models video summarization as a supervised learning problem. Typical hierarchy for
the videos is scenes, then shots, and then frames. Summarization can be accomplished using two
techniques: Keyframe or static video summarization, which outputs the \key" frames from the
videos and Video skimming or dynamic video summarization, which outputs a collection of
shorter clips from the video. For static video summarization, following methods will be tested
subject to time constraints

_ Shot boundary based


_ Clustering based methods for choosing frames (k-means, hierarchical)
_ Perceptual feature based (color histogram, motion-based and object based)
_ Semantic Feature based keyframe selection
_ Scene-Change Detection

For dynamic video summarization, following methods will be tested

_ Model-Based Summarization
_ Time-Compression based
_ Text and speech recognition based

Effect of dimensionality reduction techinques such as PCA and SVD, and various transforms
such as Discrete Haar Wavelet transforms, SIFT (Scale Invariant Feature Transform) are also
proposed to be studied, wherever relevant. Supervised video summarization may also be
explored.

7
5.RELATED WORK

Video Summarization has gained a lot of attention in recent times. The task of video
summarization can be accomplished in primarily two ways: Key Frame Extraction and Video
Skimming. A lot of the initial work in video summarization has been done in the domain of
unsupervised learning, but the recent trend has shifted towards learning from how humans
generate video summaries, leading it into the supervised domains. A brief review of the work
done in the video summarization is done according to the broad categorization above.

5.1 Key Frame Extraction


Key frame extraction saw most of the initial work done in this domain. Key frame extraction, as
the name suggests, is to choose the most informative frames from the video. These indexed
frames are supposed to be the best ones that summarize the video. The key frame extraction is
primarily used to obtain static summaries.
One of the staple algorithms for video summarization, VSUMM, posed the key frame extraction
problem in a clustering setup [3]. Some of the other popular approaches in key frame extraction
are . Some of these are based on using web image priors while others use clustering based
algorithms on various low-level features and change detection.
The essential necessity in the clustering based algorithms is to extract the features of
interest that might make a frame worth being displayed in a summary. Some methods suggest
choosing video skims around these key frames to make the summary more watchable [1]. Other
variations in clustering based algorithms include a different clustering model, like, where the
authors use hierarchical clustering to summarize a video.

5.2 Video Skims


Using key frames to summarize a video might be useful for automatically analyzing the content
of the video, but it produces a discontinuous and rather unpleasant summary for human viewing.
This calls for summarizing a video in the form of skims of the video. This however is a complex
task and often is more difficult to achieve for user videos which lack structure. The semantic
meaning is frequently required in such cases. Some work in this region, like, tries to learn the
meaning of the video by detecting motion of the camera and dividing the tries to do so by
detecting and clustering low-level features, and detecting differences between frames. Often the
original algorithms for key frame extraction can be be extended to video skimming by choosing
a continuous set of frames around key frames to produce a video skim.

5.3 Supervised and Unsupervised Methods

Selecting the important shots/frames of a video is done using both unsupervised and supervised

8
methods. Unsupervised algorithms like [1] use manually defined factors for comparing frames
and then subsequently choosing the key ones. The evaluation metrics are generally based on the
intuition of how the frames should be different from each other and should be placed far apart in
a feature space. The intuition in this approach is to find features of interest in frames, clustering
frames that have similar features and finally choosing parts of the video that are highly
dissimilar. With increasing availability of datasets with available human extracted summaries,
there has been some promising recent work [1] in the domain of supervised learning. Supervised
algorithms require training examples to train the model to learn which parts of a video are
important. The general requirement for supervised learning algorithms is that they require
annotated data which is available in the form of level of interest that a frame has, or binary data
indicating if a frame is to be included in the summary or not. It is important, however, in all
methods to take into account the sequential nature of a video and how humans perceive a video.
The results of supervised learning are promising, although, still not comparably different from
some of the unsupervised methods.

9
6.TECHNICAL FEASIBILITY

1. OpenCV

(Open Source Computer Vision) is a library of programming functions mainly aimed at real-
time computer vision. Library aimed at image and Video processing, mainly aimed at using the
operations related to images and videos.

2.Numpy

It is a general-purpose array-processing package. It provides a high-performance


multidimensional array object, and tools for working with these arrays. It is the fundamental
package for scientific computing with Python. As images are treated as multi-dimensional
array therefore all the computation is based on numpy.

3. Imageio

Imageio is a Python library that provides an easy interface to read and write a wide range of
image data, including animated images, video, volumetric data, and scientific formats. It is
cross-platform, runs on Python 2.7 and 3.4+.

4.Scikit-learn

Scikit-learn is a free software machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to
interoperate with the Python numerical and scientific libraries NumPy and SciPy

5.Matplotlib

They will be for analysing the predicting results.

10
7. REFERENCES

1.Video Summarization: Techniques and Applications - Zaynab El khattabi, Youness


Tabii, Abdelhamid Benkad- dour

2. Diverse Sequential Subset Selection for Supervised Video Summarization - Boqing Gong,
Wei-Lun Chao, Kris- ten Grauman, Fei Sha

3. Video Summarization - Ben Wing.

11

Potrebbero piacerti anche