Sei sulla pagina 1di 3

Business Context

The Hollywood industry per year produces around 600 movies a


year(2015) with the budget touching as much as $ 2 bn for a single movie
and an average running time of around 134 minutes.
Besides, the movies, according to rating aggregator site IMDB, these
movies are rated by users (on a weighted average scale) from 0-10- 10
being the best and 0 being the least. As of now, IMDb has around 3.9
million titles and around 7.4 million personalities listed in its database.
The question then arises can greatness of a movie be decided before it is
released based on a select few factors?

Business Objective
Besides, ratings by both users and critics are subjective in nature and may
skew the ratings in favour of a relatively lesser quality movie.
Hence our objective is to determine subjectively, the greatness of a
movie, without relying on user/critic reviews and instincts

Mining Objective
To find those variables which have a profound effect (high correlation) on
the quality or perceived quality of the movie and as a result makes the
movie,
Note: Since there might be multiple factors/variables that correlate to the
quality and/or perceived quality of the movie, it might lead to multidimensional contingency tables (cross tabs) and thus, a significant
correlation/interdependency with one specific variable may not be
achieved
Questions that need to be answered besides the primary objective:
1. Does facebeook likes of actors and director matter to the
popularity/greatness of the movie
2. Do movie tags, genre tags, plot keywords etc. relate to popularity of
movie

3. Do the number of human faces in movie poster correlate with the


movie rating?
4. Does aspect ratio, movie duration and content rating have any
correlation to movie rating?

Details of the data set are given belowSource- IMDb


28 variables
1. Movie Title- Nominal
2. Colour- Categorical
3. Number Critic For Review- Continuous
4. Movie Facebook Like-Continuous
5. Duration- Continuous
6. Director Name- Nominal
7. Director Facebook Like- Continuous
8. Actor_3_Name- Nominal
9. Actor_3_Facebook_Likes- Continuous
10.
Actor_2_Name- Nominal
11.
Actor_2_Facebook_Likes- Continuous
12.
Actor_1_Name- Nominal
13.
Actor_1_Facebook_Likes- Continuous
14.
Gross- Continuous
15.
Genres- Nominal
16.
Number Voted Users- Continuous
17.
Cast Total Facebook Likes- Continuous
18.
Facenumber In Poster- Continuous
19.
Plot Keywords
20.
Movie Imdb Link
21.
Number User For Reviews
22.
Language- Nominal
23.
Country- Nominal
24.
Content Rating- Continuous
25.
Budget- Continuous
26.
Title Year- Nominal
27.
Imdb Score- Continuous
28.
Aspect Ratio
Number of records 5043 movies have been analysed on the 28
variables.

Explanation Of Dataset

Attributes to be used: Out of the 28 variables, we plan to


exclude variable title_year & movie_link and we are
focussing on movies released only in USA, making country
variable irrelevant
Derived Data: Gross Revenue/ Budget, to give a realistic

measure of data
Most Attributes are random in nature except for content
rating and color which have a limited number of distribution

of data points regarding its categories.


(a) Gross Revenue and budget are correlated to an extent and
hence we take a derived value i.e Gross Revenue/ Budget to

eliminate the correlation


Dependent Variable: Gross Revenue/ Budget

Potrebbero piacerti anche