Sei sulla pagina 1di 41

SIMPLE LINEAR REGRESSION

AND CORRELATION

BY : SOEWONO, DRS.

SUMMARY
SIMPLE LINEAR REGRESSION
Managerial decisions

Relationship between two or more variables.


Objective approach is to collect data on the two variables
and then use statistical procedures to determine how
the variables are related.

Regression Analysis m is a statistical procedure that can be


used to develop a mathematical equation showing how
variables are related.

In this section we consider the simplest type of regression#,


is called : Simple Linear Regression.

What is SLR ?

Situations involving one independent and


one dependent variable for which the
relationship between the variables is
approximated by a straight line is called
simple linear regression.

Regression Analysis involving two or more


#
independent variables is called multiple

THE LEAST SQUARES METHOD


A graph of the available data in which the independent
variable appears on the horizontal axis and
dependent variable appears on the vertical axis is
called : scatter diagram / scatter gram .

The least square method is a procedure that is


used to find the straight line that provides the
best approximation for the ralationship
between the independent and dependent
variables.
#

The least squares method provides an


estimated regression equation that minimizes
the sum of squared deviations between the
observed values of the dependent variable
and the estimated values of the dependent
variable.
No other straight line will produce a sum of
squared deviations / error as small as.
The regression line fits Data is the best.
#

SIMPLE LINEAR REGRESSION


Regression analysis is a statistical procedure that can be used to
develop a mathematical equation showing how variables are
related
THE SIMPLEST TYPE OF REGRESSION ?
Situations involving one independent and one dependent variable
for which the relationship between the variables is approximated by
a straight line, is called simple linear regression

LEAST SQUARES METHOD


1

The least squares method is a procedure that is used to find the


straight line that provides the best approximation for the
relationship between the independent snd dependent variables

THE TECHNIQUE THAT PRODUCES THIS LINE IS CALLED

LSM
This line is called : the least squares line or the the fitted line or
the regression line

LEAST SQUARES METHOD


2. The least squares method provides an estimated regression
equation that minimizes the sum of squared deviations between
the observed value of the dependent variable (yi) and the
estimated values of the dependent variable ( i)

The difference between the points and the line are called
residuals.
The minimized sum of squared difference is called SSE, the sum
of squares for error
sum of squares
due to error

No other straight line will produce a sum of squares error as


small as

1. Introduction
John Maynard Keynes, a great British economist, wanted to
explain fluctuations in consumer spending. He believed that
consumer spending was one of the keys to understanding
economic booms and busts. Keynes hypothesized that
household income was the primary determinant of household
spending.
When income goes up, people spend more; when their income
drops, they spend less.
A simple algebraic representation of Keyness theory is :

Y=+X
Where Y is consumer spending and X is income; and are
two unknown parameters that describe the relationship
between income and consumption.
Income is the explanatory variable, because changes in
spending. Spending is the dependent variable, because
#
spending depends on income.

Positive Relationship

Y
4

Y
2
1.5

0.5

X
0

Y 1increases
as
2
3 X increases
4
5
6

Y
4

Negative Relationship

X
Y decreases
X increases
0 0.2 0.4
0.6 0.8 1as1.2
1.4 1.6 1.8 2

No Relationship

3
2
1
0

X
isnt0.6
affected
0 0.2Y0.4
0.8 by
1 X1.2 1.4

The term regression was first used as a statistical


concept in 1877 by Sir Francis Galton.
Galton made a study that showed that the height of
children born to tall parents will tend to move back, or
regress, toward the mean height of the population.
Galton called the line describing this relationship a line
of regression.
He designated the word regression as the name of the
general process of predicting one variable (the height of
the children) from another (the height of the parent).
In regression analysis, we shall develop an estimating
equation, that is, a mathematical formula
the known variables to the unknown variable.

that relates

2. Type of Regression Model


The first step in any regression analysis is to assemble the data. The
next step is to plot the data.
The term scatter plot is used for the representation of data pair (x,y),
where y is the dependent / response variable and x is the explanatory /
regressor / predictor / independent variable.
In general a data set will consist of n observation points : (x1, y1), (x2,
y2),, (xn, yn). Bivariate data set is necessary to determine the equation
or model which relates the explanatory variable, x , to the dependent
variable y.
Situations involving one explanatory and one dependent variable for
which the relationship between the variable is approximated by as
straight line; this is called simple linear regression. The regression line,
can be used to estimate of y for any given value of x. Regression
#
analysis involving two or more explanatory variables is called multiple

3. The Linear Regression model


Clearly any regression line must pass as close as possible to all of the
data points. The model is generally used to provide an estimate for y
for any given value of x and so the difference between the value of y
which is actually observed and the corresponding value of y proposed
by the model, the error, should be as small as possible for every data
point.
The equation of the straight-line model could be written in many ways.
Commonly used forms are :
y=mx+n;
y = a +b x
In the context of regression analysis the equation of the straight line is
usually written as:
yi = 0 + 1 xi ;

i = 1,2,.,n

Where 0 is the intercept and


1 is the slope of the line
In practice, a perfect straight line passing through all the observation
#
points, never occurs.

The model for the perfect straight line plus an error,


which may be positive or negative, may be written
as :
yi = 0 + 1 xi + ei ;

i = 1,2,,n

The true value of the regression constant, 0, is estimated


from the sample data as b0 or a.
The true value of the slope, 1 , is estimated from the
sample data as b1 or b.
The estimated value of y when x = xi is denoted by i and
may be calculated by using the equation:
#

i = a + b x
a= -b

3.5 y
3

di = yi - i = ei

2.5
2

(x1, y1)

(x2, 2)

(xi, yi) (xn, n)

(xi, i)

(xn, yn)

xi
2

xn
2.5

( , )

1.5
1

(x1, 1) (x2, y2)

0.5
0
0

x1
0.5

x2
1

1.5

A line filled to data (xi, yi) ; i = 1,2,.,n


by the method of least squares.

x
3

Adrien Marie Legendre


#

VARIABLE IN REGRESSION ANALYSIS


X

Predictor

Predicted

Independent variable

Dependent variable

Explanatory variable

Explained variable

Stimulus

Response

Exogenous

Endogenous

Known variable

Unknown variable

1. Regression analysis is predicting one variable from the other, using


an estimated straight line that summarizes the relationship between
the variables.
2. Linear regression analysis is predicting one variable from the other,
when the two have a linear relationship.
3. Each of your data points has a residual, which tells you how far the
point is above (or below, if negative) the line.
RESIDUAL = ACTUAL Y PREDICTED Y ()
= Y (a + b X)

4. The Least Squared Method or Method of Least Squared

A mathematical technique that determines the values of a and b that minimized


difference ( Y - ) is known as the least squares method.
Simple linier regression analysis is concerned with finding the straight line that
fits the data best. The best fit means that we wish to find the straight line for
which the differences between the actual values Yi and the values that would be
predicted from the filled line of regression i are as small as possible.
Because these differences will be both positive and negative for different
observations, mathematically we minimize :

where
Yi = actual value of Y for observation i
i = predicted value of Y for observation i
Since

i = a + b Xi , we are minimizing
#

A mathematical technique that determines the values of a and b that minimizes


this difference is known as the least squares method.
Let,
Hence,
(*)
To minimize (*), we must take the partial derivatives with respect to a and b, set
them equal to zero. We get :
(1)
(2)
From (1)

From (2)

Hence, the equation to the line of best fit can be


written as

This line is called the line of regression of Y on X


The other equation of the line, known as the line
of regression of X on Y,

5. The Coefficient of Determination


Coefficient of determination

can be defined as :

regression sum of squares


total sum of squares

SSR
SST

.(*)

Where

Where SSE = error sum of squares

The sample coefficient of correlation r may be obtained from equation (*);


so that :
#

The sample coefficient of correlation r, can be computed directly using the


following formula :

or, using the calculator formula :

EXERCISES
1. The director of Graduate Studies at a large college of business would like to be able to
predict the Grade Point Index (GPI) of students in an MBA program based on Graduate
Management Aptitude Test (GMAT) score. A sample of 15 students who had completed 2
years in the program is selected; the results are as follows :
Relating GPI to GMAT score
Observation

GMAT score

GPI

Observation

GMAT score

GPI

688

3.72

616

3.45

647

3.44

10

594

3.33

652

3.21

11

567

3.07

608

3.29

12

542

2.86

680

3.91

13

551

2.91

617

3.28

14

573

2.79

557

3.02

15

536

3.00

599

3.13

(a) Plot a scatter diagram /scatter gram /scatter plot


(b) Use the least square method to find the regression coefficients a and b
#
(c) Use the regression model, to predict the GPI for a student with a GMAT score of 600

2. Given are five observations taken for two variables X and Y


Observation
i

Xi

Yi

25

25

20

30

16

(a). Develop a scatter plot for these data


(b). Use the method of least squares to compute an estimated
regression equation for the data

3. The following data were collected regarding the monthly


starting salaries and the Grade Point Averages (GPA) for
undergraduate students who had obtained a degree in
political science.
GPA
(x)

Monthly Salary ($)


(y)

2.6

1100

3.4

1400

3.6

1800

3.2

1300

3.5

1600

2.9

1200

(a) Develop a scatter gram for these data


(b) Use the least squares method to develop the estimated
regression equation
(c) Predict the monthly starting salary for a student with# a
3.0 GPA and for a student with a 3.5 GPA

4. A real estate agent would like to predict the selling price of


single-family homes. After careful consideration, he concludes
that the variable likely to be most closely related to the selling
price is the size of the house. As an experiment, he takes a
random sample of 15 recently sold house and records the
selling price (in $1,000) and the size (in 100 ft2) of each.
These data are shown in the accompanying table. Find the
sample regression line for data.
House Size
(x)

Selling Price
(y)

House Size
(x)

Selling Price
(y)

20.0

89.5

24.3

119.9

14.8

79.9

20.2

87.6

20.5

83.1

22.0

112.6

12.5

56.9

19.0

120.8

18.0

66.6

12.3

78.5

14.3

82.5

14.0

74.3

27.5

126.3

16.7

74.8

16.5

79.3

5. Students in a small class were polled by a surveyor attempting to


establish a relationship between hours of study in the week
immediately preceding a mayor midterm exam and the marks
received on the exam. The surveyor gathered the data listed in the
accompanying
table.
Hours
of Study
Exam Score
(x)

(y)

25

93

12

57

18

55

26

90

19

82

20

95

23

95

15

80

22

85

61

(a) Find the equation of the regression line to help predict the exam
#
score on the basis of study hours.
(b) If a student study 16 hours, what is exam score?

EXAMPLES
1. Suppose the data in tabel (below) represent
the
grade point averages of 15 recent
graduates and their starting annual salaries

2/23/15

SWN/PROBABILITY AND
STATISTIC

28

GPA

Starting salary

2.95

18.5

3.20

20.0

3.40

21.1

3.60

22.4

3.20

21.2

2.85

15.0

3.10

18.0

2.85

18.8

3.05

15.7

2.70

14.4

2.75

15.5

3.10

17.2

3.15

19.0

2.95

17.2

2.75

16.8
2/23/15

#
SWN/PROBABILITY AND
STATISTIC

29

a) Determine a regression equation for average


starting salary as a function of grade point average.
b) Determine the sample coefficient of correlation r
(correlation coefficient)
Note:
regression equation = regression line =
least squares line =
least squares prediction equation
The methodology used to obtain this line is called the
least squares method (method of least squares)
2/23/15

#
SWN/PROBABILITY AND
STATISTIC

30

GPA

SALARY

ESTIMATED
SALARY

2,95

18,5

54,575

8,7025

342,25

17,32

3,20

20,0

64,000

10,2400

400,00

19,35

3,40

21,1

71,740

11,5600

445,21

20,98

3,60

22,4

80,640

12,9600

501,76

22,60

3,20

21,2

67,840

10,2400

449,44

19,35

2,85

15,0

42,750

8,1225

225,00

16,51

3,10

18,0

55,800

9,6100

324,00

18,54

2,85

18,8

53,580

8,1225

353,44

16,51

3,05

15,7

47,885

9,3025

246,49

18,13

2,70

14,4

38,880

7,2900

207,36

15,29

2,75

15,5

42,625

7,5625

240,25

15,70

3,10

17,2

53,320

9,6100

295,84

18,54

3,15

19,0

59,850

9,9225

361,00

18,95

2,95

17,2

50,740

8,7025

295,84

17,32

2,75

16,8

46,200

7,5625

282,24

15,70

45,6

270,8

830,425
2/23/15

139,5100

4970,12
270,79
#
SWN/PROBABILITY AND
STATISTIC

31

a) Jadi persamaan regresi (estimasi/taksiran)

b) Correlation coefficient (koeffisient korrelasi)

2/23/15

SWN/PROBABILITY AND
STATISTIC

32

2.The heights of fathers, X, and the


heights of their oldest sons when grown,
Y, are given as measurements to the
nearest
inch.64
X
68
70
72
69
74
Y

67

68

69

73

66

70

a) Construct a scattergram / scatterplot


b) Find the equation of the least squares
regression line
c) Compute the2/23/15
coefficientSWN/PROBABILITY
of correlation
AND 33
STATISTIC

SOLUSI
a) Buat dan kerjakan sendiri dalam sistim koordinat orthogonal.

b)

Total
s

X2

Y2

XY

68

67

4624

4489

4556

64

68

4096

4624

4352

70

69

4900

4761

4830

72

73

5184

5329

5256

69

66

4761

4356

4554

74

70

5476

4900

5180

417

413

29.041

28.459

28.728

2/23/15

#
SWN/PROBABILITY AND
STATISTIC

34

Jadi, persamaan garis regressi :

c) Koefficien korrelasi r ?

2/23/15

#
SWN/PROBABILITY AND
STATISTIC

35

3. A company would like to predict how the


trainees in its salesmanship court will
perform. At the beginning of their two
months course, the trainees are given an
aptitude test. This is the X-score shown
below. Records are kept of the sales records
of each salesman and constitute Y-values.
X

18

26

28

34

36

42

48

52

54

60

54

64

54

62

68

70

76

66

76

74

a. Plot this data


b. Find the regression line relating performance on
the test to sales
c. The coefficient corellation
#
2/23/15

SWN/PROBABILITY AND
STATISTIC

36

Solusi
Buat tabulasi, untuk menghitung X, Y,

, dan XY

y2 = 44680
X

X2

XY

18

57

324

972

2916

26

64

676

1664

4096

28

54

784

1512

2916

34

62

1156

2108

3844

36

68

1296

2448

4624

42

70

1764

2940

4900

48

76

2304

3648

5776

52

66

2704

3432

4356

54

76

2916

4104

5776

60

74

3600

X 398

Y 664

Dari sini
dapat
dihitung b
dan a

Y 46.52 0.4994 X

4440
5476
X 2 17524 XY 27268
#
2/23/15
SWN/PROBABILITY AND
STATISTIC

37

1.The advertising expense and profit of


` five years is given
company for each of
below:
Adversiting expense
Profit
(in
$
(in thousands) $
thousands)
Aa
2
15
7 East
50
10

110

15

220

20

200

a.Find the equation of the least-square line.


b. If advertising expense is to be $9000 for particular
years, predict for profit that year.
#

2. The mathematics SAT scores and the


college grade point averages of 8 students
are given below:
Math SAT score (x)

College Grade
Point Average (y)

600

3.20

550

3.00

500

3.00

650

3.50

625

2.80

480

2.60

700

3.60

580

3.10

Calculate the correlation coefficient r !

3. A personnel manager wants to predict the salary for


a system analyst based on number of years
experience. A random sample of 12 systems analyst
produces the following results:
a. find the least-squares line
b. predict the salary of a systems analyst with 5
Years experience
Salary(thousands)
years ex.
5.5

19.9

9.0

25.5

4.0

23.9

8.0

24.0

9.5

22.5

3.0

20.5

7.0

21.0

1.5

17.7

8.5

30.0

7.5

25.0

9.5

21.0

6.0

18.6

Years experience

S5alary(thousands)

5.5

19.9

9.0

25.5

4.0

23.9

8.0

24.0

9.5

22.5

3.0

20.5

7.0

21.0

1.5

17.7

8.5

30.0

7.5

25.0

9.5

21.0

6.0

18.6
#

Potrebbero piacerti anche