Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The correlation coefficient gives you just the degree of relationship or association. It
cannot help you estimate or predict the response variable for a given independent
variable. The response variable is called the dependent variable.
In the present problem involving Sigma Property, ‘%' Occupancy is the independent
variable and ‘Revenue’ is the dependent variable. Revenue depends on % Occupancy.
Using regression analysis, it is possible to predict revenue for a given % occupancy.
Regression Model
Simple Linear Regression Model: In this model, dependent variable is a linear function
of one independent variable. For the present case, Revenue may be structured as a linear
function of % occupancy. Based on sample data collected for the dependent and
independent variable, a model is postulated connecting the dependent variable with the
independent variable in a linear equation form. Symbolically, we write the sample
regression line as follows:
Yˆ b0 b1 x1
where
ŷ is the estimate for the dependent variable(revenue)
x1 is the independent variable(% individual occupancy)
b0 and b1 are determined by statistical least square method. b1 is called the regression
coefficient(slope) and b0 is the constant term (intercept).
Historical Perspective
Just for knowledge sake, it is worth pointing out here that the estimates for b0 and b1
obtained by least square method are called ‘Best Linear Unbiased Estimates’ (BLUE)
first pioneered by Gauss and Markoff in the context of General Linear Models that take
care of Multiple Linear Regression as well.
The values of b0 and b1 are obtained by solving the normal equations that are given
below:
y nb 0 b1 x1
yx b0 x1 b1 x1
2
1
b1 =
(x x )(y y)
1 1
(x x ) 1 1
2
b0 = y b1 x 1
To understand the nitty-gritty of simple regression, let us take the present problem for
which we give below the relevant data.
Revenue($) %Occupancy
514,440 65.7
463,115 61.1
598,182 78.2
454,924 65.4
453,803 63.5
502,228 70.6
626,262 81.2
498,703 72
514,458 72.9
623,291 81.7
454,768 62.1
385,573 53.4
You postulate the model for the population in the standard form as follows:
Y= β0+β1X1
Y is the Revenue measured in $, β0 is the intercept and β1 is the slope corresponding the
independent variable X1(%Occupancy)
Regression Statistics
Multiple R 0.95806386
R Square 0.91788636
Adjusted R Square 0.909674996
Standard Error 22417.56601
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 56175962978 56175962978 111.7824 9.51905E-07
Residual 10 5025472657 502547265.7
Total 11 61201435635
Coefficients Standard Error t Stat P-value
Intercept -60376.5 54097.94734 -1.116059834 0.29049627
%Occupancy 8231.778 778.5864232 10.57272182 9.51905E-07
1) Regression Equation
The fitted equation is ŷ = -60376.55+8231.78x1. The equation implies that for every
1% increase in individual occupancy, the revenue will increase by $8231.78.
Multiple R denotes the correlation coefficient between the two variables namely %
occupancy and revenue. The value of R =0.9581(shows very strong positive
correlation between revenue and % occupancy).
Adjusted R2: When more independent variables are added in the regression model,
R2 value will increase. It needs to be corrected to reflect the reality. This is achieved
(n 1)
by Adjusted R2. It is computed by the formula Adjusted R 2 1 (1 R 2 )
(n k - 1)
.
Standard Error: The standard error of the sample dependent variable is given by the
square root of the mean square corresponding to the Residual term in the ANOVA
table that just follows the Regression Statistics.
Squares)
Total Sum of Squares = (y y) 2
=61201435635
Mean Squares due to regression and error are worked out by dividing the sum of
squares by the corresponding degrees of freedom. F statistics computed is
nothing but the ratio between the mean squares of regression and residual. That is
calculated F = 111.7824.
Null Hypothesis: There is no linear relationship between Y and X 1 in the population
regression line. All the betas in the population line are zero(β0=0 β1 =0)
If you look at the P-Value(9.51905E-07), it is less than α(5%) and hence null hypothesis
is rejected. The conclusion is that revenue is linearly related to % occupancy at 5% level
of significance.
You postulate the model for the population in the standard form as follows:
Y= β0+β1X1+ β2X2
Y is the Revenue measured in $, β0 is the intercept and β1 is the slope corresponding the
independent variable X1(% Individual Occupancy) and β2 is the slope corresponding the
independent variable X2(% Group Occupancy)
% Individual % Group
Revenue($) Occupancy Occupancy
514,440 42.30 23.44
463,115 36.82 24.32
598,182 45.40 32.79
454,924 38.78 26.67
453,803 42.31 21.22
502,228 40.65 29.41
626,262 40.00 39.76
498,703 37.66 33.10
514,458 37.49 34.20
623,291 41.96 38.68
454,768 34.29 27.32
385,573 36.04 16.52
Output(Table 2)
Regression Statistics
Multiple R 0.963021796
R Square 0.92741098
Adjusted R Square 0.911280087
Standard Error 22217.49119
Observations 12
ANOVA
Significance
df SS MS F F
Regression 2 5.68E+10 28379441702 57.49285 7.48E-06
Residual 9 4.44E+09 493616914.7
Total 11 6.12E+10
Standard Upper
Coefficients Error t Stat P-value Lower 95% 95%
Intercept -110027.553 83255.78 -1.321560533 0.218921 -298365 78310.25
% Individual Occupancy 9649.624857 2166.151 4.454733798 0.001589 4749.448 14549.8
% Group Occupancy 8171.183638 983.4503 8.308690357 1.63E-05 5946.463 10395.9
1) Regression Equation
Multiple R denotes the correlation coefficient between the two variables namely %
occupancy and revenue. The value of R =0.9581(shows very strong positive
correlation between revenue and % occupancy).
Adjusted R2: When more independent variables are added in the regression model,
R2 value will increase. It needs to be corrected to reflect the reality. This is achieved
(n 1)
by Adjusted R2. It is computed by the formula Adjusted R 2 1 (1 R 2 )
(n k - 1)
.
Standard Error: The standard error of the sample dependent variable is given by the
square root of the mean square corresponding to the Residual term in the ANOVA
table that just follows the Regression Statistics.
If you look at the P-Value(7.48E-06), it is less than α(5%) and hence null hypothesis is
rejected. The conclusion is that revenue is linearly related to % individual occupancy
and % group occupancy at 5% level of significance. The P values for individual
coefficients (% Individual Occupancy and % Group Occupancy) based on t stat are
0.001589 and 1.63E-05 respectively. These two are highly significant at 5% level and
hence they are important predictors for Revenue of Sigma Property.