Sei sulla pagina 1di 38

CHAPTER 4:

INTRODUCTORY
LINEAR REGRESSION

1
CHAPTER OUTLINE:
4.1 SIMPLE LINEAR REGRESSION

- Curve fitting

- Inferences about estimated parameter

- Adequacy of the models

- Linear correlation

4.2 Multiple Linear Regression

2
Introduction:
 Regression – is a statistical procedure for establishing the
r/ship between 2 or more variables.

 This is done by fitting a linear equation to the observed


data.

 The regression line is then used by the researcher to see


the trend and make prediction of values for the data.

 There are 2 types of relationship:


 Simple ( 2 variables)
 Multiple (more than 2 variables)

3
4.1 The Simple Linear Regression Model
 is an equation that describes a dependent variable (Y) in
terms of an independent variable (X) plus random error 
Y  0  1 X  
where,
 0 = intercept of the line with the Y-axis
1 = slope of the line
 = random error
 Random error, is the difference of data point from the
deterministic value.
 This regression line is estimated from the data collected by
fitting a straight line to the data set and getting the
equation of the straight line,
  
Y   0 1 X 4
Example 4.1:

1) A nutritionist studying weight loss programs might


wants to find out if reducing intake of carbohydrate
can help a person reduce weight.
a) X is the carbohydrate intake (independent
variable).
b) Y is the weight (dependent variable).

2) An entrepreneur might want to know whether


increasing the cost of packaging his new product will
have an effect on the sales volume.
a) X is cost
b) Y is sales volume
5
3.1.1 CURVE FITTING (SCATTER PLOTS)

 A scatter plot is a graph or ordered pairs (x,y).

 The purpose of scatter plot – to describe the


nature of the relationships between
independent variable, X and dependent variable,
Y in visual way.

 The independent variable, x is plotted on the


horizontal axis and the dependent variable, y is
plotted on the vertical axis.
6
SCATTER DIAGRAM
 Positive Linear Relationship

E(y)

Regression line

Intercept Slope 1
0 is positive

7
SCATTER DIAGRAM
 Negative Linear Relationship

E(y)

Intercept
0 Regression line

Slope 1
is negative

8
SCATTER DIAGRAM
 No Relationship

E(y)

Intercept Regression line


0
Slope 1
is 0

9
LINEAR REGRESSION MODEL

 A linear regression can be develop by freehand


plot of the data.
Example 4.2:
The given table contains values for 2 variables, X
and Y. Plot the given data and make a freehand
estimated regression line.

X -3 -2 -1 0 1 2 3
Y 1 2 3 5 8 11 12
10
11
4.1.2 INFERENCES ABOUT ESTIMATED PARAMETERS

Least Squares Method


• The least squares method is commonly used to
determine values for  0 and 1 that ensure a best
fit for the estimated regression line to the sample
data points

• The straight line fitted to the data set is the line:


  
y   0 1 x

12
LEAST SQUARES METHOD
Theorem :
 Given the sample data  xi , yi  ; i  1, 2,.... n , the
coefficients of the least squares line are:
  
y   0 1 x
i) y-Intercept for the Estimated Regression Equation,
 
 0  y  1 x

 x and y are the mean of x and y respectively.


13
LEAST SQUARES METHOD
ii) Slope for the Estimated Regression Equation,
 Sxy
1 
Sxx
Where,
 n  n 
  xi   yi 
S xy   xi yi   i 1  i 1 
n

i 1 n 2
 n

  yi 
S yy   yi 2   i 1 
2 n
 n

  xi  i 1 n
  xi 2   i 1 
n
S xx
i 1 n
14
LEAST SQUARES METHOD

• Given any value of x i the predicted value of the


dependent variable ŷ, can be found by substituting
x i into the equation
  
y   0 1 x

15
Example 4.3: Students score in history
The data below represent scores obtained by ten primary
school students before and after they were taken on a
tour to the museum (which is supposed to increase their
interest in history)
Before,x 65 63 76 46 68 72 68 57 36 96
After, y 68 66 86 48 65 66 71 57 42 87

a) Fit a linear regression model with “before” as the


explanatory variable and “after” as the dependent variable.

b) Predict the score a student would obtain “after” if he scored


60 marks “before”.

16
Solution
n  10  xy  44435
 x  647  x  44279
2
x  64.7
 y  656  y  44884
2
y = 65.6

S xy  44435 
 647  656 
 1991.8
10
647 2
S xx  44279   2418.1
10
6562
S yy  44884   1850.4
10
17
S xy 1991.8
a) ˆ1    0.8237
S xx 2418.1
ˆ0  y  ˆ1 x  65.6   0.8237  64.7   12.3063

Y  12.3063  0.8237 x

b) x  60

Y  12.3063  0.8237  60   61.7283

18
4.1.3 ADEQUACY OF THE MODEL
COEFFICIENT OF DETERMINATION(R 2)
 The coefficient of determination is a measure of the
variation of the dependent variable (Y) that is
explained by the regression line and the
independent variable (X).
2
 The symbol for the coefficient of determination is r
or R2 .
 If r =0.90, then r 2 =0.81. It means that 81% of the
variation in the dependent variable (Y) is accounted
for by the variations in the independent variable (X).
 The rest of the variation, 0.19 or 19%, is unexplained
and called the coefficient of nondetermination.
 Formula2 for the coefficient of nondetermination is 19
1.00  r
2
R
COEFFICIENT OF DETERMINATION( )
 Relationship Among SST, SSR, SSE

SST = SSR + SSE

 i
( y  y ) 2
  i
( ˆ
y  y ) 2
  i i
( y  ˆ
y ) 2

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

 The coefficient of determination is:


2
SSR Sxy 
r 
2
  
where:
SST SxxSyy
SSR = sum of squares due to regression
SST = total sum of squares 20
Example 4.4

1) If r =0.919, find the value for r 2 and explain the


value.

Solution :
r 2 = 0.84. It means that 84% of the variation in the
dependent variable (Y) is explained by the
variations in the independent variable (X).

21
4.1.4 Linear Correlation (r)
 Correlation measures the strength of a linear
relationship between the two variables.
 Also known as Pearson’s product moment
coefficient of correlation.
 The symbol for the sample coefficient of correlation
is r , population  .
 Formula : S xy
r
S xx .S yy

 @
r  (sign of b1 ) r 2

22
Properties of r :

 1  r  1
 Values of r close to 1 implies there is a strong
positive linear relationship between x and y.
 Values of r close to -1 implies there is a strong
negative linear relationship between x and y.
 Values of r close to 0 implies little or no linear
relationship between x and y.

23
Refer Example 4.3: Students score in history
c)Calculate the value of r and interpret its
meaning

Sxy
Solution: r
Sxx .Syy
1991.8

 2418.1 1850.4 
 0.9416
Thus, there is a strong positive linear
relationship between score obtain before (x)
and after (y).
24
Refer example 4.3:

The sign of b1 in the equation yˆ  12.3063  0.8237 x is “+”.

Calculate the r 2 first. Then, use the second formula of r.


r = + .8866
r = +.9416
Assumptions About the Error Term 

1. The error  is a random variable with mean of zero.

2. The variance of  , denoted by  2, is the same for


all values of the independent variable.

3. The values of  are independent.

4. The error  is a normally distributed random


variable.
4.1.5 TEST OF SIGNIFICANCE

 To determine whether X provides information in


predicting Y, we proceed with testing the
hypothesis.
 Two test are commonly used:

i) t Test

ii) F Test

27
1) t-Test
1. Determine the hypotheses.

H0 : 1  0 ( no linear r/ship)

H1 : 1  0 (exist linear r/ship)

2. Compute Critical Value/ level of significance.

t / p-value
,n2
2
3. Compute the test statistic.
ˆ1
t
Var ˆ1 
 S yy  ˆ1S xy  1
 
Var ˆ1 
 n2

 S xx 28
1) t-Test
4. Determine the Rejection Rule.

Reject H0 if :

t < - t or t > t 
,n2 ,n2
2 2
p-value < 
5.Conclusion.

There is a significant relationship between


variable X and Y.

29
2) F-Test
1. Determine the hypotheses.

H0 : 1  0 ( no linear r/ship)

H1 : 1  0 (exist linear r/ship)

2. Specify the level of significance.


F with degree of freedom (df) in the numerator (1) and
degrees of freedom (df) in the denominator (n-2)

3. Compute the test statistic.

F = MSR/MSE
4. Determine the Rejection Rule.

Reject H0 if :
p-value < 
F test > F ,1, n  2 30
2) F-Test
5.Conclusion.

There is a significant relationship between


variable X and Y.

31
Refer Example 4.3: Students score in history
d) Test to determine if their scores before and after
the trip is related. Use 0.05

Solution:
1. H 0 :  1  0 ( no linear r/ship)
H 1 :  1  0 (exist linear r/ship)
2.   0.05, t  2.306
0.05/2 ,8

1 
S 

 S

1 xy  1
3. ttest 

Var (  1 )  
yy
  n  2  Sxx
Var (  1 )  
 1850.4  (0.8237)(1991.8)  1
0.8237   2418.1
  7.926  8
0.0108  0.0108 32
4. Rejection Rule:
ttest  t0.025 ,8
 7.926 2.306

5. Conclusion:
Thus, we reject H0. The score before (x) is
linear relationship to the score after (y) the
trip.

33
ANALYSIS OF VARIANCE (ANOVA)
The value of the test statistic F for an ANOVA test is
calculated as:

F=MSR
MSE
 To calculate MSR and MSE, first compute the
regression sum of squares (SSR) and the error sum of
squares (SSE).

34
ANALYSIS OF VARIANCE (ANOVA)
General form of ANOVA table:
Source of Degrees of Sum of Mean Squares Value of the
Variation Freedom(df) Squares Test Statistic
Regression 1 SSR MSR=SSR/1
Error n-2 SSE MSE=SSE/n-2 F=MSR
MSE
Total n-1 SST

ANOVA Test
1) Hypothesis: H 0 : 1  0
H1 : 1  0
2) Select the distribution to use: F-distribution
3) Calculate the value of the test statistic: F
4) Determine rejection and non rejection regions:
5) Make a decision: Reject Ho/ accept H0 35
Example 4.5
The manufacturer of Cardio Glide exercise equipment wants to study the
relationship between the number of months since the glide was purchased
and the length of time the equipment was used last week.

1) Determine the regression equation.


2) At   0.01, test whether there is a linear relationship between the
variables

36
Solution (1):

Regression equation:

Yˆ  9.939  0.637 X

37
Solution (2):
1) Hypothesis:
H 0 : 1  0
H1 : 1  0
1) F-distribution table: F0.01,1,8  11.2586
2) Test Statistic:
F = MSR/MSE = 17.303
or using p-value approach:
significant value =0.003
4) Rejection region:
Since F statistic > F table (17.303>11.2586 ), we reject H0
or since p-value (0.003 < 0.01 )we reject H0
5) Thus, there is a linear relationship between the variables
(month X and hours Y).
38

Potrebbero piacerti anche