Corelation and Regression

Correlation
&
Simple Linear Regression
Dr. Sanjay Rastogi, IIFT, New Delhi.

Sample Covariance
• The sample covariance measures the strength of the
linear relationship between two variables (called
bivariate data)
• The sample covariance:
n
 ( X  X)( Y  Y )
i i
cov ( X , Y )  i1
n 1
• Only concerned with the strength of the relationship
– No causal effect is implied
– Depends on the unit of measurement used for X and Y

• Covariance between two random
variables:
cov(X,Y) > 0 X and Y tend to move in the same
direction
cov(X,Y) < 0 X and Y tend to move in opposite

directions
cov(X,Y) = 0 X and Y are independent

Coefficient of Correlation
• Measures the relative strength of the linear
relationship between two variables
• Sample coefficient of correlation:
cov (X , Y)
r
SX SY
where
n
 (Xi  X)(Yi  Y)
n
 (X  X)
n
 (Y  Y )
2 2
i
cov (X , Y)  i1
SX  i1
i
n 1 SY  i1
n 1 n 1

Features of r:
• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear
relationship
• The closer to 0, the weaker the linear relationship

Scatter Plots of Data
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3
Dr. Sanjay Rastogi, IIFT, New Delhi. r=0
Using Excel to Find the Correlation
Coefficient
• Select
Tools/Data Analysis
• Choose Correlation
from the selection menu
• Click OK . . .

Using Excel
• Input data range and select

appropriate options
• Click OK to get output

Interpreting the Result
• r = .733 Scatter Plot of Test Scores
100
• There is a relatively 95
strong positive linear
Test #2 Score
90
85
relationship between 80
test score #1
75
70
70 75 80 85 90 95 100
and test score #2 Test #1 Score
Students who scored high on the first test tended

to score high on second test, and students who
scored low on the first test tended to score low
on the second test

Correlation vs. Regression
• A scatter diagram can be used to show the
relationship between two variables
• Correlation analysis is used to measure
strength of the association (linear
relationship) between two variables
– Correlation is only concerned with strength of the
relationship
– No causal effect is implied with correlation

Regression Analysis
• Regression analysis is used to:
– Predict the value of a dependent variable based on the
value of at least one independent variable
– Explain the impact of changes in an independent variable
on the dependent variable
Dependent variable: the variable we wish to predict
or explain
Independent variable: the variable used to explain
the dependent variable

Simple Linear Regression Model
• Only one independent variable, X

• Relationship between X and Y is
described by a linear function
• Changes in Y are assumed to be
caused by changes in X

Types of Relationships
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X
X
Strong relationships Weak relationships
Y Y
X X
Y Y
X
X
No relationship
X
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi  β0  β1Xi  ε i
Linear component Random Error
component

Simple Linear Regression
Model
Y Yi  β0  β1Xi  ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value
Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi
X
Simple Linear Regression Equation
The simple linear regression equation provides an

estimate of the population regression line
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i
intercept
Value of X for
Ŷi  b0  b1Xi
observation i
The individual random error terms ei have a mean of zero

• b0 and b1 are obtained by finding the
values of b0 and b1 that minimize the sum
of the squared differences between Y
and Ŷ :
min  (Yi Ŷi )  min  (Yi  (b0  b1Xi ))

2 2

Interpretation
• b0 is the estimated average value of
Y when the value of X is zero
• b1 is the estimated change in the

average value of Y as a result of a
one-unit change in X

Example
• A real estate agent wishes to examine the
relationship between the selling price of a
home and its size (measured in square feet)
• A random sample of 10 houses is selected
– Dependent variable (Y) = house price in
$1000s
– Independent variable (X) = square feet

House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

Graphical Presentation
Scatter plot
450
400
House Price ($1000s)
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet

Regression Using Excel
• Tools / Data Analysis / Regression

Output:
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R
Square 0.52842
The regression equation is:
Standard Error 41.33032
house price  98.24833  0.10977 (square feet)
Observations 10
ANOVA Significance
df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Upper
Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 Dr. Sanjay Rastogi, IIFT,
0.03297 New Delhi.
3.32938 0.01039 0.03374 0.18580
• House price model: scatter plot and
regression line
450
400
House Price ($1000s)
350 Slope
300 = 0.10977
250
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

Interpretation
• b0 is the estimated average value of Y when the

value of X is zero (if X = 0 is in the range of
observed X values)
– Here, no houses had 0 square feet, so b0 = 98.24833
just indicates that, for houses within the range of
sizes observed, $98,248.33 is the portion of the
house price not explained by square feet

• b1 measures the estimated change in the
average value of Y as a result of a one-
unit change in X
– Here, b1 = .10977 tells us that the average value of
a house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of
size

Predictions using Regression Analysis
Predict the price for a house with 2000
square feet:
house price  98.25  0.1098 (sq.ft.)
 98.25  0.1098(2000)
 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850

Measures of Variation
• Total variation is made up of two parts:
SST  SSR  SSE

Total Sum Regression Error Sum of
of Squares Sum of Squares Squares
SST   ( Yi  Y )2 SSR   ( Ŷi  Y )2 SSE   ( Yi  Ŷi )2

where:
Y = Average value of the dependent variable
Yi = Observed values of the dependent variable
Ŷ = Predicted value of Y for the given X value
i i

• SST = total sum of squares
– Measures the variation of the Yi values around
their mean Y
• SSR = regression sum of squares
– Explained variation attributable to the
relationship between X and Y
• SSE = error sum of squares
– Variation attributable to factors other than the
relationship between X and Y

Y
Yi  
SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2

Y  _
_ SSR = (Yi - Y)2 _
Y Y
X X
Dr. Sanjay iRastogi, IIFT, New Delhi.
Coefficient of Determination, r2
• The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
• The coefficient of determination is also called r-
squared and is denoted as r2
SSR regression sum of squares
r 
2

SST total sum of squares
0 r 1 2

Examples of Approximate r2 Values
Y
r2 = 1
Perfect linear relationship

between X and Y:
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X
X
r =1
2
Y
0 < r2 < 1
Weaker linear relationships

between X and Y:
X
Some but not all of the
Y
variation in Y is explained
by variation in X
X
r2 = 0
Y
No linear relationship
between X and Y:
The value of Y does not

X depend on X. (None of the
r2 = 0
variation in Y is explained
by variation in X)

Output SSR 18934.9348
Regression Statistics r2    0.58082
Multiple R 0.76211 SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032 house prices is explained by
Observations 10 variation in square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Standard Error of Estimate
• The standard deviation of the variation of
observations around the regression line is estimated
by
SSE  i i
( Y  Ŷ ) 2
S YX   i1
n2 n2
Where
SSE = error sum of squares
n = sample size

Output
Multiple R 0.76211 S YX  41.33032
R Square 0.58082
Adjusted R
Square 0.52842
Observations 10
ANOVA Significance
df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Upper
Intercept 98.24833 Dr. Sanjay Rastogi, IIFT, 1.69296
58.03348 New Delhi.0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Comparing Standard Errors
SYX is a measure of the variation of observed
Y values from the regression line
Y Y
small s YX X large s YX X
The magnitude of SYX should always be judged relative to the
size of the Y values in the sample data
i.e., SYX = $41.33K is moderately small relative to house prices in
the $200 - $300K range
Assumptions of Regression
• Linearity
– The underlying relationship between X and Y is linear
• Independence of Errors
– Error values are statistically independent
• Normality of Error
– Error values (ε) are normally distributed for any given
value of X
• Equal Variance (Homoscedasticity)
– The probability distribution of the errors has constant
variance

Residual Analysis ei  Yi  Ŷi
• The residual for observation i, ei, is the difference
between its observed and predicted value
• Check the assumptions of regression by examining
the residuals
– Examine for linearity assumption
– Evaluate independence assumption
– Evaluate normal distribution assumption
– Examine for constant variance for all levels of X
(homoscedasticity)
– Graphical Analysis of Residuals
– Can plot residuals vs. X
Inferences About the Slope
• The standard error of the regression slope
coefficient (b1) is estimated by
S YX S YX
Sb1  
SSX  (X  X) i
2
where:
Sb1 = Estimate of the standard error of the least squares slope
SSE
S YX  = Standard error of the estimate
n2

Output
Multiple R 0.76211
R Square 0.58082
Adjusted R
Square 0.52842
Observations 10
Sb1  0.03297
ANOVA Significance
df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Upper
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Comparing Standard Errors of the Slope
Sb1 is a measure of the variation in the slope of regression
lines from different possible samples
Y Y
small Sb1 X large Sb1 X

Inference about the Slope: t Test
• t test for a population slope
– Is there a linear relationship between X and Y?
• Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 0 (linear relationship does exist)
• Test statistic
b1  β1 where:
t b1 = regression slope
Sb1 coefficient
β1 = hypothesized slope
d.f.  n  2 Sb 1= standard
error of the slope
Inference about the Slope:
t Test
(continued)
House Price Simple Linear Regression Equation:

Square Feet
in $1000s
(x)
(y) house price  98.25  0.1098 (sq.ft.)
245 1400
312 1600
279 1700 The slope of this model is 0.1098
308 1875
199 1100 Does square footage of the house
219 1550 affect its sales price?
405 2350
324 2450
319 1425
255 1700
Inferences about the Slope:t Test Example
b1 Sb1
H0: β1 = 0 From Excel output:
H1: β1  0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
b1  β1 0.10977  0
t  t  3.32938
Sb1 0.03297

Test Statistic: t = 3.329
H0: β1 = 0
H1: β1  0 b1 Sb1 t
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
d.f. = 10-2 = 8
Decision:
/2=.025 /2=.025 Reject H0
Conclusion:
Reject H Do not reject H Reject H
There is sufficient evidence
0
-tα/2 0
tα/2 0
0 that square footage affects

-2.3060 2.3060 3.329
Dr. Sanjay Rastogi, house price
IIFT, New Delhi.
P-value = 0.01039
H0: β1 = 0
H1: β1  0
P-value
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
This is a two-tail test, so Decision: P-value < α so

the p-value is Reject H0
P(t > 3.329)+P(t < -3.329) Conclusion:
= 0.01039 There is sufficient evidence
(for 8 d.f.) that square footage affects
Dr. Sanjay Rastogi, house price
IIFT, New Delhi.
F Test for Significance
• F Test statistic:
MSR
F
where
MSE
SSR
MSR 
k
SSE
MSE 
n  k 1
where F follows an F distribution with k numerator and (n – k - 1)

denominator degrees of freedom
(k = the number of independent variables in the regression model)

Output
Multiple R 0.76211
R Square 0.58082 MSR 18934.9348
F   11.0848
Adjusted R Square 0.52842
MSE 1708.1957
Observations 10
P-value for
With 1 and 8 degrees the F Test
of freedom
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

F Test for Significance
H0: β1 = 0
H1: β1 ≠ 0
 = .05 Test Statistic:
MSR
df1= 1 df2 = 8 F  11.08
MSE
Decision:
Critical Reject H0 at  = 0.05
Value:
F = 5.32
 = .05 Conclusion:
There is sufficient evidence that
0 F house size affects selling price
Do not Reject H0
reject H0
F.05 = 5.32 Dr. Sanjay Rastogi, IIFT, New Delhi.
Confidence Interval Estimate for the Slope
Confidence Interval Estimate of the Slope:
b1  t n2Sb1 d.f. = n - 2
Excel Printout for House Prices:

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
At 95% level of confidence, the confidence interval for

the slope is (0.0337, 0.1858)

t Test for a Correlation Coefficient
• Hypotheses
H0: ρ = 0 (no correlation between X and Y)
H1: ρ ≠ 0 (correlation exists)
• Test statistic
r -ρ (with n – 2 degrees of freedom)
t
where
1 r 2
r   r2 if b1  0
n2
r   r2 if b1  0

Example:
Is there evidence of a linear relationship between
square feet and house price at the .05 level of
significance?
H0: ρ = 0 (No correlation)

H1: ρ ≠ 0 (correlation exists)
 =.05 , df = 10 - 2 = 8
r ρ .762  0
t   3.329
1 r 2 1 .762 2
n2 10  2

r ρ .762  0 Decision:
t   3.329
1 r 2 1 .762 2 Reject H0
n2 10  2 Conclusion:
There is
d.f. = 10-2 = 8
evidence of a
linear association
/2=.025 /2=.025
at the 5% level of
significance
Reject H0 Do not reject H0 Reject H0
-tα/2 0
tα/2
-2.3060 2.3060
3.329
Time Series & Forecasting
Forecasting - Basics
Why Forecasting?
• Demand or sales forecasting is the foundation stone upon which the entire
business planning is built.
• An organization cannot predict its profitability without predicting sales revenue.

Sales revenue cannot be predicted without forecasting sales in physical
quantities.
• The entire production program and materials resource planning cannot be

achieved without a realistic sales forecast of the various products the
organization would like to market.
• Corporate plans, turnaround plans and competitive business strategies need the
help of forecasting. In other words, not to forecast is to assume status quo and
do nothing. This will never be acceptable to any manager in any organization.
Dr. Sanjay Rastogi, IIFT, New Delhi

Forecasting - Basics
Why Forecasting?
• We must of course recognize the fact that future is uncertain and therefore no
forecasting can be hundred percent accurate. This is a paradox in forecasting:
on the one hand you need sales forecasts and on the other hand no forecast can
be accurate.

• Managers in any business enterprise have no choice between forecasting and
not forecasting because without a sound forecasting system, the risk of making
a wrong decision increases. Managers however have choice amongst the
methods of forecasting.

Forecasting Techniques in Practice

Selecting the Right Forecasting
Technique-Guidelines
• Availability of data: If no appropriate historical data are available,
quantitative techniques of forecasting are not possible. Only qualitative
forecasting techniques are possible.

• Accuracy envisaged: Greater the accuracy needed, greater is the need for
sophisticated techniques of forecasting.

• Urgency with which the forecast is sought: If forecasts are required urgently,
only less sophisticated techniques are possible to use.

• Cost: This includes cost of forecasting exercise and what it costs the firm if a
wrong forecast is made.

Qualitative Methods of Forecasting
When Do You Use Qualitative Forecasting?

• Imagine your company is about to
introduce a new product that is
unknown in the market. In this context,
there will be no historical data that you
could use to forecast your sales. It is a
situation where you will find complete
absence of any useful data. Under
these circumstances, qualitative
forecasting is the only method by
which you could forecast your sales for
your new product.
Qualitative Methods in Practice
• Expert Opinion
• Market Survey
• Delphi Method
• Historical Analogy

Expert Opinion
• In this method, a group of experts from diverse background such as marketing,
sales, finance, operations, and purchasing are asked to make forecast for the
product under consideration. A consensus is then reached on a forecast figure.
Each expert brings with him/her a set of biases, and perspectives that might
influence the forecast. Of course, their judgment would be substantiated by a
wealth of information that include past data, industry growth rates, competitive
strategies and reactions from customers and distributors.

• The advantages of this method: 1) It is fast and efficient. 2) It is timely and based
on good information content. 3) It uses the collective knowledge of experts.

• The disadvantages of this method: 1) Experts can make mistakes. 2) Subjectivity
and bias of experts can vitiate the forecast. 3) The group dynamics of the experts
could be greatly influenced by the degree of dominance of a particular person. He
who could shout loudest might get his way.

Market Survey
In this method, you conduct a market survey of customers’ intentions to buy a
product. A carefully designed questionnaire is administered to the selected target
audience of customers. Customers are selected independently using a
representative random sample. This method is very popular and if carefully
implemented will give you good results.
• This is the apt technique to use, particularly if you want to forecast sales for a new
product or new brand.
• This method of forecasting requires the active cooperation of the target audience.
• The sample size must be reasonably large. Larger the sample size, smaller will be
the standard error and sampling error.
• Larger the sample size, the more time consuming and costly the survey will be.
So, you have to strike a balance between sample size and cost.

Delphi Method
• In the expert opinion method of forecasting, a consensus forecast
is arrived at after eliciting the opinion and views of experts with
diverse background. Certainly this method is subject to group
dynamics (effects). At times, judgments may be highly influenced
by persuasions of some group members who have strong likes and
dislikes. Delphi method attempts to retain the wisdom and
accumulated knowledge of a group while simultaneously
attempting to reduce the group effects.

• In Delphi method, group members are asked to make individual
assessment about a forecast. These assessments are compiled and
then fed back to the members, so that they get the opportunity to
compare their judgment with others. They are then given an option
to revise their forecasts. After three or four replications, group
members reach their final conclusion.
Historical Analogy
This method is applied when a new product is about to be introduced by

a company. Forecasting sales for new products are difficult in view of
lack of proper historical data. Historical analogy method attempts to
forecast sales for a new product based on the performance of related or
similar products in the market place. The database of sales of these
products forms the basis for forecasting.

Drawbacks of Historical Analogy
• You cannot precisely say how your new product is similar or related
to a particular product.

• Suppose you have a number of products that you feel are similar to
yours. Which of these will you consider as most similar to yours?

• Products that are similar to yours could have failed in the past for a
variety of reasons. Let us say a similar product failed in the past
because whenever there was an advertisement about this product, it
was not available on the shelf. So, the consumers developed a
negative perception about this product and became skeptical about its
availability. You may not know all these and simply conclude your
product will also fail!

Quantitative Methods of Forecasting
• Quantitative forecasting uses statistical analysis of data to forecast
sales. Time series analysis and causal model fall under the
purview of quantitative forecasting.

Time Series Analysis
• Time series are series of observations
that are taken at regular intervals of
time. Data on weekly sales, monthly
sales, and annual sales are examples
of time series.
• Like many other data sets, if you have
a time series data set, the first step in
analyzing it is to draw a graph,
particularly a simple scatter diagram
or a line graph that will reveal sharply
any underlying patterns.
Components of Time Series
• Trend (T) represents the long-term behavior of a time series. This would tell
whether the time series data reveal a steady upward or downward movement.

• Seasonal Variation (S) represents variation caused by season. Typically this
shows variation in demand during peak and lean season. For example, demand
for snow tires will be at its peak during winter in USA.

• Cyclical Variation(C) represents the typical business cycles that occur
sporadically in several years. For example, in stock market, you will witness
cycle of buoyancy or boom and cycle of recession that occur once in a while
between many years.

• Random Variation(R) represents irregular variations that occur by chance
having no assignable cause. Random variation cannot be predicted.

Essence of Moving Average
• The pattern revealed in observations
vary over a time horizon. Instead of
taking the average of all historical
data, only the latest n periods of the
data are used to get a forecast for
the next period. This is the very
essence of moving average forecast.
• Moving Average (MA) Forecast for
the next period = Average of n most
recent time series data.
Moving Average Example Problem
A company is interested in forecasting

demand for one of its products. Past data on
demand for the last 12 months are available
and given below: Using a period of 3 months,
make a moving average forecast for period
13(13th month).

Moving Average:
Month Sales(100
units)
1 15
2 9
3 16
4 17
5 11
6 20
7 10
8 17
9 12
10 9
11 18
12 20
Drawbacks of Moving Average
• Moving averages do not react well to
seasonal variations
• All observations considered in a time
horizon are given the same weight
• A large amount of historical data
should be gathered and maintained to
update forecast values
• The choice of the period(n) is
generally arbitrary.

Exponential Smoothing
Exponential smoothing is a particular case of moving
Average in which there are three components.
1) The forecast for the most recent period
2) The actual value for the period
3) A smoothing constant . This smoothing constant 
is a weighting factor that lies between 0 and 1.
The selection of the right kind of  is matter of judgment by

the experienced user. But, it must be chosen very carefully.

• Exponential smoothing is an excellent forecasting technique for
short term forecasting.
• It is used not only in sales forecasting but also in forecasting input

prices in materials procurement.
• The single biggest advantage is that this technique is extremely

simple to use.

Exponential Smoothing-Formula
New Forecast = (actual value)+(1-)(last
forecast).
The meaning of this statement is explained with an

example. Suppose, the actual sale for month 2 is 50
units and your forecast for month 2 is 55 units. Let
us take  = 0.3.

New Forecast = (0.3)(50)+(1-0.3)(55) =53.5.
Exponential Smoothing-Example
A company is interested in forecasting demand for

one of its products. Past data on demand for the last
12 months are available and given below: Using
exponential smoothing technique, forecast demand
for month 13. Take =0.2

Month Sales(100
Units)
1 15
2 14
3 16
4 17
5 15
6 18
7 20
8 22
9 23
10 21
11 24
12 26
Exponential Smoothing-Example
Solution
• Forecast for Month 13
Forecast for the 13th Month

= 0.2(actual value)+(1-)(last forecast)
= 0.2(26)+(1- 0.2)(20.2) = 21.36.
This is the demand forecast for the month 13.

Points to Ponder on Selection of Smoothing
Constant
• The question that is often raised in the usage of exponential smoothing is “is
there a scientific way to fix the value of the smoothing constant ? The
answer is both Yes and No.
• Yes, because you can get that value of  that gives the minimum mean
square error.
• No, because mean square error is highly influenced by the square terms of
individual errors.
• Judgment of  based on experience and tracking forecasting efficiency is the

only way out. This is indeed a weakness of exponential smoothing method of
forecasting.

Corelation and Regression

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Corelation and Regression

Caricato da

Copyright:

Formati disponibili

Correlation

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

cov(X,Y) < 0 X and Y tend to move in opposite

cov(X,Y) = 0 X and Y are independent

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

• Input data range and select

Dr. Sanjay Rastogi, IIFT, New Delhi.

strong positive linear

and test score #2 Test #1 Score

Students who scored high on the first test tended

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

• Only one independent variable, X

Dr. Sanjay Rastogi, IIFT, New Delhi.

Strong relationships Weak relationships

Dr. Sanjay Rastogi, IIFT, New Delhi.

The simple linear regression equation provides an

The individual random error terms ei have a mean of zero

Dr. Sanjay Rastogi, IIFT, New Delhi.

min  (Yi Ŷi )  min  (Yi  (b0  b1Xi ))

Dr. Sanjay Rastogi, IIFT, New Delhi.

• b1 is the estimated change in the

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

• b0 is the estimated average value of Y when the

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

SST  SSR  SSE

SST   ( Yi  Y )2 SSR   ( Ŷi  Y )2 SSE   ( Yi  Ŷi )2

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Perfect linear relationship

Weaker linear relationships

The value of Y does not

Dr. Sanjay Rastogi, IIFT, New Delhi.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

Dr. Sanjay Rastogi, IIFT, New Delhi.

small Sb1 X large Sb1 X

Dr. Sanjay Rastogi, IIFT, New Delhi.

House Price Simple Linear Regression Equation:

Dr. Sanjay Rastogi, IIFT, New Delhi.

0 that square footage affects

This is a two-tail test, so Decision: P-value < α so

where F follows an F distribution with k numerator and (n – k - 1)

(k = the number of independent variables in the regression model)

Dr. Sanjay Rastogi, IIFT, New Delhi.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Dr. Sanjay Rastogi, IIFT, New Delhi.