Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Tommy Chheng
Department of Computer Science
University of California, Irvine
tchheng@uci.edu
Abstract
1 Introduction
We have seen YouTube and other media sources pushing the bounds of video consuming
in the past few years. As media sources compete for more of a viewers time everyday,
one possible alleviation is a video summarization system. A movie teaser is an example
of a video summary. However, not everyone has the time to edit their videos for a concise
version. See [2] for a more detailed description of the problem statement.
This paper highlights a fast and efficient algorithm using k-means clustering with RGB
histograms for creating a video summary. It is aimed particularly at low quality media,
specifically YouTube videos.
2 Approach
A outline of our system will be as follows:
We selected RGB color histograms for our feature comparator due to its global nature
and speed of processing. In Ruis Unified Video Summarization system[2], he cities his-
tograms are a good trade-off between accuracy and speed. Additionally, Valdes work[1]
for the TRECVID 2007 Rushes Task also cites video summarization methods based on
histograms were comparable to other features but without the performance loss. One par-
ticular attribute of the histograms is their global content. Histogram is a frequency approach
where it compresses the information of a video frame into a vector. Each entry in the vector
is a count of a color. Histograms lose spatial information but in a task like video summa-
rization, the spatial information may not be needed. The majority of YouTube videos are
lower quality so extracting more challenging features tends to be more difficult. Histograms
can perform well because they do not attempt to infer any semantic meaning in the actual
segments.
For our task, we went with an unsupervised learning approach because of the lack of prior
knowledge from Internet videos. We use k-means clustering to group together the related
scenes.
2.2.1 Algorithm
We want to group all the similar histograms into the k clusters. Each histogram is repre-
sentative of the corresponding video segment. Our version of the K-means algorithm is
defined below:
Additionally, we also experimented with the cosine similarity and saw no noticeable dif-
ference in the clustering output.
3 Results
We selected k = 8 as our k-means parameter and use 20 segments for the output video.
3.1 Dataset
We processed following YouTube videos in our system. All of these videos are 320x240.
1. MotoGP: Recent round of the world motorcycle racing series. This represents a
typical sports video.
2. Chad Vader: A typical comedy video.
3. Tour of LA beaches: A semi-edited amateur web video.
4. Man Vs Wild Episode.
We see some interesting and useful results. In the Tour of LA beaches video shown in
Figure 3, the clustering grouped the scenes into the beach, boardwalk and indoor separately.
This is a good summary for viewers because it shows all the major sections of the video
clip.
When we clustered the MotoGP clip, it was able to separate all the action footage from the
pit stand footage. This is particularly useful for viewers who only want to watch the race
and not the pit stand.
In Figure 5, the Chad Vader video clip separated all the credits into one cluster. It has
a negative side effect for the video summary creation. Since we are using a round-robin
approach for segment joining, the credits were dispersed throughout the summary.
The Man vs Wild episode was able to correctly cluster different segments. It particu-
larly helped that the uniquely identifying segments had much color similarity. When the
Bear(the main actor) was in the desert, the colors are populated with a higher color inten-
sity. Similarly, when he was in the Florida everglades, the colors are lower in intensity.
3.3 Performance
The majority of our runtime is in the processing overhead including the histogram ex-
traction. In each iteration of K-Means clustering, the n frames are compared against k
Figure 3: Tour of LA beaches clusters: Each row is a cluster.
centroids. The iterations are generally constant. It took approximately 10 iterations to con-
verge. This gives us a O(kn) runtime for the clustering algorithm. Certainly scalable for
any production use.
4 Problems
4.1 Repeated segments
We run into problems of repeated segments when dealing with static images in the videos.
When a static image is present for a long time, two or more segments will be created from
this image. During the clustering, all of the segments with the static image will be clustered
in the same group. On the round-robin segment fetching, these static images will be littered
through the summary video. This was the case in the Tour of the LA Beaches video as seen
in Figure 2.
4.2 Background
In the MotoGP video clip, the majority of the segments consists of the road in the back-
ground. Our algorithm grouped most of these shots into one cluster. The intended behavior
would be to capture the different teams into different clusters because each team has a
unique color scheme. However, the background dominated and grouped most of these seg-
ments together. It would interesting future work to see if two levels of clustering would be
helpful: one for the initial segments and another sub-clustering for within each set.
5 Conclusion
We have presented a system to automatically create a summarized video from a YouTube
video. K-means is a simple and effective method for clustering similar frames together.
Our system is modular in design so future work can be developed by substituting in various
components. Instead of using histograms, future work can try to use other features such
as motion vectors or even audio. However, we have demonstrated that a simple feature
with a simple unsupervised learning technique can be a good starting point for a video
summarization system.
Acknowledgments
Thanks to Deva Ramanan and the CS273 class for the experience in Machine Learning.
References
[1] Vctor Valdes and Jose M. Martnez. On-line video skimming based on histogram
similarity. In TVS 07: Proceedings of the international workshop on TRECVID video
summarization, pages 9498, New York, NY, USA, 2007. ACM.
[2] Regunathan Radhakrishnan Ajay Divakaran Thomas S. Huang Yong Rui,
Ziyou Xiong. Unified framework for video summarization. MERL, Sept 2004.
http://www.merl.com/publications/TR2004-115/.