Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Investment in Hotels has been on the rise and there might be huge potential for
statistical analysis
Authentic (from official sources) and sufficient secondary data available for analysis
Design of experiments
Since we are finding the effect of various factors on hotel room pricing, so pricing becomes
the dependent variable and all other factor becomes independent variable.
In first step, we found whether the data of dependent variable is normally distributed or not
using descriptive analysis such as histogram, box plot, Q-Q plot, skewness and kurtosis test,
Kolmogorov-Smirnov test and Shapiro-Wilk test.
On the basis of the output of above tests, we decided whether we will use parametric or non-
parametric tests in further analysis.
After doing the parametric or non-parametric test analysis (whichever is decided in previous
step) we came to a conclusion that which are the few factors that affects hotel room pricing
the most.
Next step will be to make a regression model and then compare the adjusted R-square value
of those few factors to know which factor is influencing pricing the most which is our alternate
hypothesis.
On the basis of above analysis and results, we will either accept or reject the null hypothesis.
Data description
The dataset tracks hotel prices on 8 different dates at different hotels across different cities.
Dependent variable
Room Rent: Rent for the cheapest room, double occupancy, in Indian Rupees.
Some hotels have more than one type of double occupancy room. For simplicity, we picked
the cheapest room with double occupancy.
External factors
Many external factors can potentially influence the Room Rent. The dataset captures some of
these external factors, as explained below.
Date: We have hotel room rent data for the following 8 dates for each hotel:
{Dec 31, Dec 25, Dec 24, Dec 18, Dec 21, Dec 28, Jan 4, Jan 8}
If a hotel is sold out on a given date, assume that the price of the hotel room on the date it is
sold out is the maximum price from the sample of dates for which prices are available.
Is Weekend: We use ‘0’ to indicate week days, ‘1’ to indicate weekend dates (Sat /
Sun)
City Name: Name of the City where the Hotel is located e.g. Mumbai
City Rank: Rank order of City by Population (e.g. Mumbai = 0, Delhi = 1, so on)
Is Metro City: ‘1’ if City Name is {Mumbai, Delhi, Kolkata, Chennai}, ‘0’ otherwise
Is Tourist Destination: We use ‘1’ if the city is primarily a tourist destination, ‘0’
otherwise. For example, Goa and Agra are primarily tourist destinations. We assume
that most people who visit Goa and Agra and stay in their hotels are in these cities
primarily for tourism.
Internal factors
Many Hotel Features can influence the Room Rent. The dataset captures some of these
internal factors, as explained below.
Hotel Name: e.g. Park Hyatt Goa Resort and Spa
Star Rating: e.g. 5
Airport: Distance between Hotel and closest major Airport
Hotel Address: e.g. Arrossim Beach, Cansaulim, Goa
Hotel Pin code: e.g. 403712
Hotel Description: e.g. 5-star beachfront resort with spa, near Arossim Beach
Free Wi-Fi: ‘1’ if the hotel offers Free Wi-Fi, ‘0’ otherwise
Free Breakfast: ‘1’ if the hotel offers Free Breakfast, ‘0’ otherwise
Hotel Capacity: e.g. 242. (Enter ‘0’ if not available)
Has Swimming Pool: ‘1’ if they have a swimming pool, ‘0’ otherwise
Analysis and inference
Descriptive analysis
To check whether the room rent data is normally distributed, we had performed descriptive
analysis on room rent. Analysis includes histogram, box plot, Q-Q plot, skewness and kurtosis
test, Kolmogorov-Smirnov test and Shapiro-Wilk test.
The results of these tests are shown below –
Median 4000.00
Variance 5.377E7
Minimum 299
Maximum 322500
Range 322201
Since we are assuming our confidence level to be 95% that means the ratio of statistics and
standard error value for both skewness and kurtosis should lie between -1.96 to +1.96 for our
data to be normal.
16.758
For skewness, ratio: 0.021
= 798 > 1.96
582.367
For kurtosis, ratio: 0.043
= 13543.41 > 1.96
But that is not the case here and hence we can infer from this analysis that the data is not
normal in nature.
Kolmogorov-Smirnova
Statistic df Sig.
It can be inferred from the shape of above histogram that the data is not normal in nature and
hence it adds to our previous two analysis.
Normal Q-Q plot –
As per this plot, if the data follows the trend line and does not differ much from it then
we can infer that the data is normal in nature.
And if the data points are not following the trend line then it will not be normal in nature.
As it is clear from the above plot that the data points are not following the trend line, so we
can infer that the data is not normal in nature.
Hence from all the above tests and analysis, we can conclude that the data of room rent is not
normal in nature.
From the above histogram it can be inferred that the data is now normal in nature.
Normal Q-Q plot –
Since, in above Q-Q plot data is following the trend line so we can assume our data to be
normal.
Correlation between dependent and independent variables
Now, there are many independent variables. We will apply Pearson Correlation to find the
correlation between these independent variables and the dependent variable. Those factors
which are not correlated can fairly be removed from further analysis.
Here,
Null hypothesis – There is no correlation between the variables
Alternate hypothesis – There is strong correlation between the variables
Since, for Weekend and No_Weekend factor, the value of p>alpha, so we will accept the null
hypothesis that there is no correlation between these independent factors and room rent. So
we can ignore this factor in further regression analysis.
Regression
Now, except the one factor mentioned above, we will apply regression analysis for all other
independent factors and will find out the best fit regression line.
ANOVAb
Here,
Null hypothesis – There is no relation between dependent and independent variable
Alternate hypothesis – There is a relation between dependent and independent variable
Since, significance value (P value) is less than alpha so we will reject the null hypothesis.
Hence there is a relation between dependent and independent variable.
Model Summary
From the above table, we will get the adjusted R square value (0.43)
This signifies that there is overall 43.4% effect of all the independent variables on the
dependent variable (Room rent).
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
This table gives us the constant and coefficient of all the independent variables in the
regression line.
So, from above table, regression line will be –
Room Rent = Antilog[ 3.118 + 0.194(Star_Rating) + 0.002(Airport_Distance) -
0.085(Not_Tourist_destination) - 0.09(Not_Metro_City) – 0.143(No_Swimming_Pool) +
0.029(No_Free_Wifi) – 0.054(No_Free_Breakfast) – 0.043(No_NewYearEve) ]
From the above Pearson Correlation table, we can infer that Star_Rating and Hotel_Capacity
and Swimming_Pool are strongly correlated with the Room Rent. All other factors are weakly
correlated.
So, we can also infer that these three factors have the highest effect on the hotel room pricing.
Summary and conclusion
From all the above analysis and inferences we can conclude that the star rating of the hotel,
hotel capacity and availability of the swimming pool are the factors which affect the hotel room
rent in Indian market and all these three factors are positively correlated with room rent.
Study Group_G3 – Abhishek Saxena, Anmol Garg, Dilpreet Kaur, Kratgya Gupta, Richa
Agarwal, Vaishali Sharma