Sei sulla pagina 1di 11

Analysing Hotel Room Pricing in Indian Market

Final Report (Group G3)

Motivation, objective and description of the problem


There are many factors that affect the pricing of Hotel rooms in India. The objective of this
analysis is to quantify the effect of different internal and external variables on the price of
rooms. Internal variables are free wifi, free breakfast, swimming pool etc. and external
variables are city rank, city population, hotel location, is the city a tourist place and many more.
Other factors that influenced our choice of study -

 Investment in Hotels has been on the rise and there might be huge potential for
statistical analysis
 Authentic (from official sources) and sufficient secondary data available for analysis

Hypothesis formulation and testing procedure


Proposed null hypothesis is that there is no effect of all these external and internal factors on
the hotel room pricing.
Alternate hypothesis is that there is a significant effect of few of these factors on the hotel
room pricing.
Analysis is done in SPSS software using different descriptive analysis, parametric or non-
parametric tests, regression etc.

Design of experiments
Since we are finding the effect of various factors on hotel room pricing, so pricing becomes
the dependent variable and all other factor becomes independent variable.
In first step, we found whether the data of dependent variable is normally distributed or not
using descriptive analysis such as histogram, box plot, Q-Q plot, skewness and kurtosis test,
Kolmogorov-Smirnov test and Shapiro-Wilk test.
On the basis of the output of above tests, we decided whether we will use parametric or non-
parametric tests in further analysis.
After doing the parametric or non-parametric test analysis (whichever is decided in previous
step) we came to a conclusion that which are the few factors that affects hotel room pricing
the most.
Next step will be to make a regression model and then compare the adjusted R-square value
of those few factors to know which factor is influencing pricing the most which is our alternate
hypothesis.
On the basis of above analysis and results, we will either accept or reject the null hypothesis.
Data description
The dataset tracks hotel prices on 8 different dates at different hotels across different cities.
Dependent variable

 Room Rent: Rent for the cheapest room, double occupancy, in Indian Rupees.

Some hotels have more than one type of double occupancy room. For simplicity, we picked
the cheapest room with double occupancy.

External factors
Many external factors can potentially influence the Room Rent. The dataset captures some of
these external factors, as explained below.
 Date: We have hotel room rent data for the following 8 dates for each hotel:
{Dec 31, Dec 25, Dec 24, Dec 18, Dec 21, Dec 28, Jan 4, Jan 8}
If a hotel is sold out on a given date, assume that the price of the hotel room on the date it is
sold out is the maximum price from the sample of dates for which prices are available.

 Is Weekend: We use ‘0’ to indicate week days, ‘1’ to indicate weekend dates (Sat /
Sun)

 Is New Year Eve: ‘1’ for Dec 31, ‘0’ otherwise

 City Name: Name of the City where the Hotel is located e.g. Mumbai

 Population: Population of the City in 2011

 City Rank: Rank order of City by Population (e.g. Mumbai = 0, Delhi = 1, so on)

 Is Metro City: ‘1’ if City Name is {Mumbai, Delhi, Kolkata, Chennai}, ‘0’ otherwise

 Is Tourist Destination: We use ‘1’ if the city is primarily a tourist destination, ‘0’
otherwise. For example, Goa and Agra are primarily tourist destinations. We assume
that most people who visit Goa and Agra and stay in their hotels are in these cities
primarily for tourism.

Internal factors
Many Hotel Features can influence the Room Rent. The dataset captures some of these
internal factors, as explained below.
 Hotel Name: e.g. Park Hyatt Goa Resort and Spa
 Star Rating: e.g. 5
 Airport: Distance between Hotel and closest major Airport
 Hotel Address: e.g. Arrossim Beach, Cansaulim, Goa
 Hotel Pin code: e.g. 403712
 Hotel Description: e.g. 5-star beachfront resort with spa, near Arossim Beach
 Free Wi-Fi: ‘1’ if the hotel offers Free Wi-Fi, ‘0’ otherwise
 Free Breakfast: ‘1’ if the hotel offers Free Breakfast, ‘0’ otherwise
 Hotel Capacity: e.g. 242. (Enter ‘0’ if not available)
 Has Swimming Pool: ‘1’ if they have a swimming pool, ‘0’ otherwise
Analysis and inference

Descriptive analysis
To check whether the room rent data is normally distributed, we had performed descriptive
analysis on room rent. Analysis includes histogram, box plot, Q-Q plot, skewness and kurtosis
test, Kolmogorov-Smirnov test and Shapiro-Wilk test.
The results of these tests are shown below –

 Skewness and Kurtosis test -


Descriptive

Statistic Std. Error

Room Rent Mean 5473.99 63.749

95% Confidence Interval for Lower Bound 5349.03


Mean
Upper Bound 5598.95

5% Trimmed Mean 4596.77

Median 4000.00

Variance 5.377E7

Std. Deviation 7.333E3

Minimum 299

Maximum 322500

Range 322201

Interquartile Range 3868

Skewness 16.758 .021

Kurtosis 582.367 .043

Since we are assuming our confidence level to be 95% that means the ratio of statistics and
standard error value for both skewness and kurtosis should lie between -1.96 to +1.96 for our
data to be normal.
16.758
For skewness, ratio: 0.021
= 798 > 1.96
582.367
For kurtosis, ratio: 0.043
= 13543.41 > 1.96

But that is not the case here and hence we can infer from this analysis that the data is not
normal in nature.

 Kolmogorov – Smirnov test (K-S) –


As per this test, the P value (Sig value) decides whether we can accept or reject the
null hypothesis which states that the data is normally distributed.
If p > alpha then accept the null hypothesis
If p < alpha then reject the null hypothesis
Tests of Normality

Kolmogorov-Smirnova

Statistic df Sig.

Room Rent .257 13232 .000

a. Lilliefors Significance Correction

If we consider alpha = 0.05


Then here, p < alpha and hence we reject the null hypothesis.
So, we can infer that the data is not normal.
 Histogram –

It can be inferred from the shape of above histogram that the data is not normal in nature and
hence it adds to our previous two analysis.
 Normal Q-Q plot –
As per this plot, if the data follows the trend line and does not differ much from it then
we can infer that the data is normal in nature.
And if the data points are not following the trend line then it will not be normal in nature.
As it is clear from the above plot that the data points are not following the trend line, so we
can infer that the data is not normal in nature.

Hence from all the above tests and analysis, we can conclude that the data of room rent is not
normal in nature.

Making the data normal


We will take log of room rent (dependent variable)
Hence the new results are as follows –
 Skewness and Kurtosis test -
Here, the value of skewness is less than 1 and the value of kurtosis is less than 3.
So now the data is normal in nature.
 Histogram –

From the above histogram it can be inferred that the data is now normal in nature.
 Normal Q-Q plot –

Since, in above Q-Q plot data is following the trend line so we can assume our data to be
normal.
Correlation between dependent and independent variables
Now, there are many independent variables. We will apply Pearson Correlation to find the
correlation between these independent variables and the dependent variable. Those factors
which are not correlated can fairly be removed from further analysis.

Here,
Null hypothesis – There is no correlation between the variables
Alternate hypothesis – There is strong correlation between the variables

Since, for Weekend and No_Weekend factor, the value of p>alpha, so we will accept the null
hypothesis that there is no correlation between these independent factors and room rent. So
we can ignore this factor in further regression analysis.

Regression
Now, except the one factor mentioned above, we will apply regression analysis for all other
independent factors and will find out the best fit regression line.

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 552.403 10 55.240 1.014E3 .000a

Residual 720.581 13221 .055

Total 1272.984 13231

a. Predictors: (Constant), No_NewYearEve, No_Free_Wifi, HotelCapacity,


Not_Tourist_Destination, No_Free_Breakfast, Airport, Population,
No_Swimming_pool, StarRating, Not_Metro_City

b. Dependent Variable: Log_Rent

Here,
Null hypothesis – There is no relation between dependent and independent variable
Alternate hypothesis – There is a relation between dependent and independent variable
Since, significance value (P value) is less than alpha so we will reject the null hypothesis.
Hence there is a relation between dependent and independent variable.

Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate

1 .659a .434 .434 .23346

a. Predictors: (Constant), No_NewYearEve, No_Free_Wifi,


HotelCapacity, Not_Tourist_Destination, No_Free_Breakfast, Airport,
Population, No_Swimming_pool, StarRating, Not_Metro_City

b. Dependent Variable: Log_Rent

From the above table, we will get the adjusted R square value (0.43)
This signifies that there is overall 43.4% effect of all the independent variables on the
dependent variable (Room rent).

Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 3.118 .017 178.566 .000

Population -9.248E-9 .000 -.127 -11.552 .000

StarRating .194 .004 .473 49.636 .000

Airport .002 .000 .138 19.506 .000

HotelCapacity -1.516E-5 .000 -.004 -.415 .678

Not_Tourist_Destination -.085 .005 -.126 -17.459 .000

Not_Metro_City -.009 .008 -.013 -1.223 .221

No_Swimming_pool -.143 .006 -.221 -25.382 .000

No_Free_Wifi .029 .008 .024 3.606 .000

No_Free_Breakfast -.054 .004 -.083 -12.392 .000

No_NewYearEve -.043 .006 -.045 -6.943 .000

a. Dependent Variable: Log_Rent

This table gives us the constant and coefficient of all the independent variables in the
regression line.
So, from above table, regression line will be –
Room Rent = Antilog[ 3.118 + 0.194(Star_Rating) + 0.002(Airport_Distance) -
0.085(Not_Tourist_destination) - 0.09(Not_Metro_City) – 0.143(No_Swimming_Pool) +
0.029(No_Free_Wifi) – 0.054(No_Free_Breakfast) – 0.043(No_NewYearEve) ]

From the above Pearson Correlation table, we can infer that Star_Rating and Hotel_Capacity
and Swimming_Pool are strongly correlated with the Room Rent. All other factors are weakly
correlated.
So, we can also infer that these three factors have the highest effect on the hotel room pricing.
Summary and conclusion
From all the above analysis and inferences we can conclude that the star rating of the hotel,
hotel capacity and availability of the swimming pool are the factors which affect the hotel room
rent in Indian market and all these three factors are positively correlated with room rent.

Study Group_G3 – Abhishek Saxena, Anmol Garg, Dilpreet Kaur, Kratgya Gupta, Richa
Agarwal, Vaishali Sharma

Potrebbero piacerti anche