Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
&
Simple Linear Regression
( X X)( Y Y )
i i
cov ( X , Y ) i1
n 1
• Only concerned with the strength of the relationship
– No causal effect is implied
– Depends on the unit of measurement used for X and Y
where
n
(Xi X)(Yi Y)
n
(X X)
n
(Y Y )
2 2
i
cov (X , Y) i1
SX i1
i
n 1 SY i1
n 1 n 1
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3
Dr. Sanjay Rastogi, IIFT, New Delhi. r=0
Using Excel to Find the Correlation
Coefficient
• Select
Tools/Data Analysis
• Choose Correlation
from the selection menu
• Click OK . . .
100
• There is a relatively 95
Test #2 Score
90
85
relationship between 80
test score #1
75
70
70 75 80 85 90 95 100
Y Y
X X
Y Y
X
Dr. Sanjay Rastogi, IIFT, New Delhi.
X
Types of Relationships
Y Y
X X
Y Y
X
Dr. Sanjay Rastogi, IIFT, New Delhi.
X
Types of Relationships
No relationship
X
Dr. Sanjay Rastogi, IIFT, New Delhi.
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi β0 β1Xi ε i
Linear component Random Error
component
Y Yi β0 β1Xi ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value
Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi
Dr. Sanjay Rastogi, IIFT, New Delhi.
X
Simple Linear Regression Equation
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i
intercept
Value of X for
Ŷi b0 b1Xi
observation i
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
ANOVA Significance
df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Upper
Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 Dr. Sanjay Rastogi, IIFT,
0.03297 New Delhi.
3.32938 0.01039 0.03374 0.18580
• House price model: scatter plot and
regression line
house price 98.24833 0.10977 (square feet)
450
400
House Price ($1000s)
350 Slope
300 = 0.10977
250
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
98.25 0.1098(2000)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
X X
Dr. Sanjay iRastogi, IIFT, New Delhi.
Coefficient of Determination, r2
• The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
• The coefficient of determination is also called r-
squared and is denoted as r2
SSR regression sum of squares
r
2
SST total sum of squares
0 r 1 2
Y
r2 = 1
X
r =1
2
Dr. Sanjay Rastogi, IIFT, New Delhi.
Examples of Approximate r2 Values
Y
0 < r2 < 1
X
Dr. Sanjay Rastogi, IIFT, New Delhi.
Examples of Approximate r2 Values
r2 = 0
Y
No linear relationship
between X and Y:
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
SSE i i
( Y Ŷ ) 2
S YX i1
n2 n2
Where
SSE = error sum of squares
n = sample size
Regression Statistics
Multiple R 0.76211 S YX 41.33032
R Square 0.58082
Adjusted R
Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA Significance
df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Upper
Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept 98.24833 Dr. Sanjay Rastogi, IIFT, 1.69296
58.03348 New Delhi.0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Comparing Standard Errors
SYX is a measure of the variation of observed
Y values from the regression line
Y Y
small s YX X large s YX X
The magnitude of SYX should always be judged relative to the
size of the Y values in the sample data
i.e., SYX = $41.33K is moderately small relative to house prices in
the $200 - $300K range
Dr. Sanjay Rastogi, IIFT, New Delhi.
Assumptions of Regression
• Linearity
– The underlying relationship between X and Y is linear
• Independence of Errors
– Error values are statistically independent
• Normality of Error
– Error values (ε) are normally distributed for any given
value of X
• Equal Variance (Homoscedasticity)
– The probability distribution of the errors has constant
variance
where:
Sb1 = Estimate of the standard error of the least squares slope
SSE
S YX = Standard error of the estimate
n2
Upper
Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Y Y
b1 β1 where:
t b1 = regression slope
Sb1 coefficient
β1 = hypothesized slope
d.f. n 2 Sb 1= standard
error of the slope
Dr. Sanjay Rastogi, IIFT, New Delhi.
Inference about the Slope:
t Test
(continued)
b1 Sb1
H0: β1 = 0 From Excel output:
H1: β1 0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
b1 β1 0.10977 0
t t 3.32938
Sb1 0.03297
H0: β1 = 0
H1: β1 0 b1 Sb1 t
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
d.f. = 10-2 = 8
Decision:
/2=.025 /2=.025 Reject H0
Conclusion:
Reject H Do not reject H Reject H
There is sufficient evidence
0
-tα/2 0
tα/2 0
b1 t n2Sb1 d.f. = n - 2
r r2 if b1 0
n2
r r2 if b1 0
n2 10 2 Conclusion:
There is
d.f. = 10-2 = 8
evidence of a
linear association
/2=.025 /2=.025
at the 5% level of
significance
Reject H0 Do not reject H0 Reject H0
-tα/2 0
tα/2
-2.3060 2.3060
3.329
Dr. Sanjay Rastogi, IIFT, New Delhi.
Time Series & Forecasting
Forecasting - Basics
Why Forecasting?
• Demand or sales forecasting is the foundation stone upon which the entire
business planning is built.
• Corporate plans, turnaround plans and competitive business strategies need the
help of forecasting. In other words, not to forecast is to assume status quo and
do nothing. This will never be acceptable to any manager in any organization.
• Expert Opinion
• Market Survey
• Delphi Method
• Historical Analogy
• Yes, because you can get that value of that gives the minimum mean
square error.
• No, because mean square error is highly influenced by the square terms of
individual errors.