Chapter 4 Regression

CHAPTER 4:
INTRODUCTORY
LINEAR REGRESSION
1
CHAPTER OUTLINE:
4.1 SIMPLE LINEAR REGRESSION
- Curve fitting
- Inferences about estimated parameter
- Adequacy of the models
- Linear correlation
4.2 Multiple Linear Regression
2
Introduction:
 Regression – is a statistical procedure for establishing the
r/ship between 2 or more variables.
 This is done by fitting a linear equation to the observed

data.
 The regression line is then used by the researcher to see

the trend and make prediction of values for the data.
 There are 2 types of relationship:

 Simple ( 2 variables)
 Multiple (more than 2 variables)
3
4.1 The Simple Linear Regression Model
 is an equation that describes a dependent variable (Y) in
terms of an independent variable (X) plus random error 
Y  0  1 X  
where,
 0 = intercept of the line with the Y-axis
1 = slope of the line
 = random error
 Random error, is the difference of data point from the
deterministic value.
 This regression line is estimated from the data collected by
fitting a straight line to the data set and getting the
equation of the straight line,
  
Y   0 1 X 4
Example 4.1:
1) A nutritionist studying weight loss programs might

wants to find out if reducing intake of carbohydrate
can help a person reduce weight.
a) X is the carbohydrate intake (independent
variable).
b) Y is the weight (dependent variable).
2) An entrepreneur might want to know whether

increasing the cost of packaging his new product will
have an effect on the sales volume.
a) X is cost
b) Y is sales volume
5
3.1.1 CURVE FITTING (SCATTER PLOTS)
 A scatter plot is a graph or ordered pairs (x,y).
 The purpose of scatter plot – to describe the

nature of the relationships between
independent variable, X and dependent variable,
Y in visual way.
 The independent variable, x is plotted on the

horizontal axis and the dependent variable, y is
plotted on the vertical axis.
6
SCATTER DIAGRAM
 Positive Linear Relationship
E(y)
Regression line
Intercept Slope 1
0 is positive
7
SCATTER DIAGRAM
 Negative Linear Relationship
E(y)
Intercept
0 Regression line
Slope 1
is negative
8
SCATTER DIAGRAM
 No Relationship
E(y)
Intercept Regression line

0
Slope 1
is 0
9
LINEAR REGRESSION MODEL
 A linear regression can be develop by freehand

plot of the data.
Example 4.2:
The given table contains values for 2 variables, X
and Y. Plot the given data and make a freehand
estimated regression line.
X -3 -2 -1 0 1 2 3
Y 1 2 3 5 8 11 12
10
11
4.1.2 INFERENCES ABOUT ESTIMATED PARAMETERS
Least Squares Method

• The least squares method is commonly used to
determine values for  0 and 1 that ensure a best
fit for the estimated regression line to the sample
data points
• The straight line fitted to the data set is the line:

  
y   0 1 x
12
LEAST SQUARES METHOD
Theorem :
 Given the sample data  xi , yi  ; i  1, 2,.... n , the
coefficients of the least squares line are:
  
y   0 1 x
i) y-Intercept for the Estimated Regression Equation,
 
 0  y  1 x
 x and y are the mean of x and y respectively.

13
ii) Slope for the Estimated Regression Equation,
 Sxy
1 
Sxx
Where,
 n  n 
  xi   yi 
S xy   xi yi   i 1  i 1 
n
i 1 n 2
 n

  yi 
S yy   yi 2   i 1 
2 n
 n

  xi  i 1 n
  xi 2   i 1 
n
S xx
i 1 n
14
• Given any value of x i the predicted value of the

dependent variable ŷ, can be found by substituting
x i into the equation
  
y   0 1 x
15
Example 4.3: Students score in history
The data below represent scores obtained by ten primary
school students before and after they were taken on a
tour to the museum (which is supposed to increase their
interest in history)
Before,x 65 63 76 46 68 72 68 57 36 96
After, y 68 66 86 48 65 66 71 57 42 87
a) Fit a linear regression model with “before” as the

explanatory variable and “after” as the dependent variable.
b) Predict the score a student would obtain “after” if he scored

60 marks “before”.
16
Solution
n  10  xy  44435
 x  647  x  44279
2
x  64.7
 y  656  y  44884
2
y = 65.6
S xy  44435 
 647  656 
 1991.8
10
647 2
S xx  44279   2418.1
10
6562
S yy  44884   1850.4
10
17
S xy 1991.8
a) ˆ1    0.8237
S xx 2418.1
ˆ0  y  ˆ1 x  65.6   0.8237  64.7   12.3063

Y  12.3063  0.8237 x
b) x  60

Y  12.3063  0.8237  60   61.7283
18
4.1.3 ADEQUACY OF THE MODEL
COEFFICIENT OF DETERMINATION(R 2)
 The coefficient of determination is a measure of the
variation of the dependent variable (Y) that is
explained by the regression line and the
independent variable (X).
2
 The symbol for the coefficient of determination is r
or R2 .
 If r =0.90, then r 2 =0.81. It means that 81% of the
variation in the dependent variable (Y) is accounted
for by the variations in the independent variable (X).
 The rest of the variation, 0.19 or 19%, is unexplained
and called the coefficient of nondetermination.
 Formula2 for the coefficient of nondetermination is 19
1.00  r
2
R
COEFFICIENT OF DETERMINATION( )
 Relationship Among SST, SSR, SSE
SST = SSR + SSE
 i
( y  y ) 2
  i
( ˆ
y  y ) 2
  i i
( y  ˆ
y ) 2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
 The coefficient of determination is:

2
SSR Sxy 
r 
2
  
where:
SST SxxSyy
SSR = sum of squares due to regression
SST = total sum of squares 20
Example 4.4
1) If r =0.919, find the value for r 2 and explain the

value.
Solution :
r 2 = 0.84. It means that 84% of the variation in the
dependent variable (Y) is explained by the
variations in the independent variable (X).
21
4.1.4 Linear Correlation (r)
 Correlation measures the strength of a linear
relationship between the two variables.
 Also known as Pearson’s product moment
coefficient of correlation.
 The symbol for the sample coefficient of correlation
is r , population  .
 Formula : S xy
r
S xx .S yy
 @
r  (sign of b1 ) r 2
22
Properties of r :
 1  r  1
 Values of r close to 1 implies there is a strong
positive linear relationship between x and y.
 Values of r close to -1 implies there is a strong
negative linear relationship between x and y.
 Values of r close to 0 implies little or no linear
relationship between x and y.
23
Refer Example 4.3: Students score in history
c)Calculate the value of r and interpret its
meaning
Sxy
Solution: r
Sxx .Syy
1991.8

 2418.1 1850.4 
 0.9416
Thus, there is a strong positive linear
relationship between score obtain before (x)
and after (y).
24
Refer example 4.3:
The sign of b1 in the equation yˆ  12.3063  0.8237 x is “+”.
Calculate the r 2 first. Then, use the second formula of r.

r = + .8866
r = +.9416
Assumptions About the Error Term 
1. The error  is a random variable with mean of zero.
2. The variance of  , denoted by  2, is the same for

all values of the independent variable.
3. The values of  are independent.
4. The error  is a normally distributed random

variable.
4.1.5 TEST OF SIGNIFICANCE
 To determine whether X provides information in

predicting Y, we proceed with testing the
hypothesis.
 Two test are commonly used:
i) t Test
ii) F Test
27
1) t-Test
1. Determine the hypotheses.
H0 : 1  0 ( no linear r/ship)
H1 : 1  0 (exist linear r/ship)
2. Compute Critical Value/ level of significance.
t / p-value
,n2
2
3. Compute the test statistic.
ˆ1
t
Var ˆ1 
 S yy  ˆ1S xy  1
 
Var ˆ1 
 n2

 S xx 28
1) t-Test
4. Determine the Rejection Rule.
Reject H0 if :
t < - t or t > t 
,n2 ,n2
2 2
p-value < 
5.Conclusion.
There is a significant relationship between

variable X and Y.
29
2) F-Test
1. Determine the hypotheses.
H0 : 1  0 ( no linear r/ship)
H1 : 1  0 (exist linear r/ship)
2. Specify the level of significance.

F with degree of freedom (df) in the numerator (1) and
degrees of freedom (df) in the denominator (n-2)
3. Compute the test statistic.
F = MSR/MSE
4. Determine the Rejection Rule.
Reject H0 if :
p-value < 
F test > F ,1, n  2 30
2) F-Test
5.Conclusion.
There is a significant relationship between

variable X and Y.
31
Refer Example 4.3: Students score in history
d) Test to determine if their scores before and after
the trip is related. Use 0.05
Solution:
1. H 0 :  1  0 ( no linear r/ship)
H 1 :  1  0 (exist linear r/ship)
2.   0.05, t  2.306
0.05/2 ,8

1 
S 

 S

1 xy  1
3. ttest 

Var (  1 )  
yy
  n  2  Sxx
Var (  1 )  
 1850.4  (0.8237)(1991.8)  1
0.8237   2418.1
  7.926  8
0.0108  0.0108 32
4. Rejection Rule:
ttest  t0.025 ,8
 7.926 2.306
5. Conclusion:
Thus, we reject H0. The score before (x) is
linear relationship to the score after (y) the
trip.
33
ANALYSIS OF VARIANCE (ANOVA)
The value of the test statistic F for an ANOVA test is
calculated as:
F=MSR
MSE
 To calculate MSR and MSE, first compute the
regression sum of squares (SSR) and the error sum of
squares (SSE).
34
ANALYSIS OF VARIANCE (ANOVA)
General form of ANOVA table:
Source of Degrees of Sum of Mean Squares Value of the
Variation Freedom(df) Squares Test Statistic
Regression 1 SSR MSR=SSR/1
Error n-2 SSE MSE=SSE/n-2 F=MSR
MSE
Total n-1 SST
ANOVA Test
1) Hypothesis: H 0 : 1  0
H1 : 1  0
2) Select the distribution to use: F-distribution
3) Calculate the value of the test statistic: F
4) Determine rejection and non rejection regions:
5) Make a decision: Reject Ho/ accept H0 35
Example 4.5
The manufacturer of Cardio Glide exercise equipment wants to study the
relationship between the number of months since the glide was purchased
and the length of time the equipment was used last week.
1) Determine the regression equation.

2) At   0.01, test whether there is a linear relationship between the
variables
36
Solution (1):
Regression equation:
Yˆ  9.939  0.637 X
37
Solution (2):
1) Hypothesis:
H 0 : 1  0
H1 : 1  0
1) F-distribution table: F0.01,1,8  11.2586
2) Test Statistic:
F = MSR/MSE = 17.303
or using p-value approach:
significant value =0.003
4) Rejection region:
Since F statistic > F table (17.303>11.2586 ), we reject H0
or since p-value (0.003 < 0.01 )we reject H0
5) Thus, there is a linear relationship between the variables
(month X and hours Y).
38

Chapter 4 Regression

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter 4 Regression

Caricato da

Copyright:

Formati disponibili

CHAPTER 4:

- Inferences about estimated parameter

- Adequacy of the models

4.2 Multiple Linear Regression

 This is done by fitting a linear equation to the observed

 The regression line is then used by the researcher to see

 There are 2 types of relationship:

1) A nutritionist studying weight loss programs might

2) An entrepreneur might want to know whether

 A scatter plot is a graph or ordered pairs (x,y).

 The purpose of scatter plot – to describe the

 The independent variable, x is plotted on the

Intercept Regression line

 A linear regression can be develop by freehand

Least Squares Method

• The straight line fitted to the data set is the line:

 x and y are the mean of x and y respectively.

• Given any value of x i the predicted value of the

a) Fit a linear regression model with “before” as the

b) Predict the score a student would obtain “after” if he scored

SST = SSR + SSE

 The coefficient of determination is:

1) If r =0.919, find the value for r 2 and explain the

The sign of b1 in the equation yˆ  12.3063  0.8237 x is “+”.

Calculate the r 2 first. Then, use the second formula of r.

1. The error  is a random variable with mean of zero.

2. The variance of  , denoted by  2, is the same for

3. The values of  are independent.

4. The error  is a normally distributed random

 To determine whether X provides information in

H1 : 1  0 (exist linear r/ship)

2. Compute Critical Value/ level of significance.

There is a significant relationship between

H1 : 1  0 (exist linear r/ship)

2. Specify the level of significance.

3. Compute the test statistic.

There is a significant relationship between

1) Determine the regression equation.

Potrebbero piacerti anche