Regression and Correlation

Engineering Probability and
Statistics
Regression and Correlation
Introduction
Linear Regression
• Given a pair of data, a regression equation can
be obtained
• Regression equation may be of degree one
(linear), two (quadratic), three (cubic) or
higher (nth order polynomial)
Linear regression
From analytic geometry:
ŷ = A + Bx
where the coefficients of A and B represent the
y-intercept and the slope respectively.
The symbol ŷ is used to distinguish between the
predicted value given by the regression line and
an actual observed value y for some value of x.
Sample Data
Student Proficiency Exam Course
Number Score Grade
1 60 70
2 90 95
3 70 70
4 85 75
5 80 90
6 65 75
7 75 75
8 60 60
9 75 80
10 70 65
Scatter Diagram and the Best-fit Line
Proficiency Exam and Course Grade of 10 Students
100
90
80
70
Course Grade
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Proficiency Exam Score
Method of Least Squares
• The least-squares procedure selects that
particular line for which the sum of the
squares of the vertical distances from the
observed points to the line is as small as
possible.
• Normal equations:
nA + B ∑ x = ∑ y
A∑ x + B ∑ x 2 = ∑ xy
Regression Coefficients
n∑ xy − ∑ x ∑ y
B=
n∑ x − ( ∑ x )
2 2
A = y − Bx
Sample Calculation
x y xy x^2 y^2
1 60 70 4200 3600 4900
2 90 95 8550 8100 9025
3 70 70 4900 4900 4900
4 85 75 6375 7225 5625
5 80 90 7200 6400 8100
6 65 75 4875 4225 5625
7 75 75 5625 5625 5625
8 60 60 3600 3600 3600
9 75 80 6000 5625 6400
10 70 65 4550 4900 4225
Totals 730 755 55875 54200 58025
Sample Calculation
10(55875) − (730)(755)
B=
10(54200) − (730) 2
B = 0.84
A=
∑y −B∑x
n n
A = 75.5 − 61.32
A = 14.18
The required regression line equation is

yˆ = 14.18 + 0.84 x
Sample Calculation – Application
Hence, the best estimate of the course grade of
a student who obtains a proficiency exam score
of 88 is
yˆ = 14.18 + 0.84(88)
yˆ = 88.1
Correlation Analysis
• When two variables are mathematically
associated, they are said to be correlated
• Correlation Analysis measures the degree of
relationship between the two variables, x and
y, by means of a single number called the
correlation coefficient, r.
Coefficient of Correlation
• Has the range of values: - 1 ≤ r ≤ 1
• If r is negative, there is an inverse relationship
between x and y, i.e., if x is increasing then y is
decreasing or vice versa.
• If r is positive, there is a direct relationship
between x and y, i.e., if x is increasing then y is
increasing or vice versa.
• If r = 0, then the two sets of data are
uncorrelated (No Correlation).
Pearson Product-Moment Correlation
Coefficient
n∑ xy − ∑ x ∑ y
r=
n x − ( x )  n y − ( y ) 
 ∑ ∑   ∑ ∑ 
2 2 2 2
Sample Calculation
Using the same sets of data,
r = 0.787882
Rank Correlation Coefficient
• A nonparametric measure of association
between two variables x and y is given by the
Spearman Rank Correlation Coefficient
6∑ d 2
r = 1−
n ( n − 1)
2
where
d = difference in ranking for each pair
n = number of pairs of data
Sample Problem
Consider the following sets of data:
Student Final Grade Extra-Curricular
Number Average Performance
1 68 D
2 62 E
3 60 E
4 99 A
5 68 C
6 78 C
7 98 B
8 84 B
9 78 B
10 91 A
Sample Calculation
x y xr yr d d^2
1 68 D 7.5 8 -0.5 0.25
2 62 E 9 9.5 -0.5 0.25
3 60 E 10 9.5 0.5 0.25
4 99 A 1 1.5 -0.5 0.25
5 68 C 7.5 6.5 1 1
6 78 C 5.5 6.5 -1 1
7 98 B 2 4 -2 4
8 84 B 4 4 0 0
9 78 B 5.5 4 1.5 2.25
10 91 A 3 1.5 1.5 2.25
Totals 11.5
Sample Calculation
The value of coefficient of correlation is
r = 0.930303
Curve-Fitting:
Other Nonlinear Relationships
1. Exponential Model
y = a1e b1 x
Linearization:
ln y = ln a1 + b1 x
Equivalence:
yˆ = ln y A = ln a1 B = b1 x=x
Curve-Fitting:
2. Simple Power Equation
y = a1 x b1
Linearization:
log y = log a1 + b1 log x
Equivalence:
yˆ = log y A = log a1 x = log x B = b1
Curve-Fitting:
3. Saturation-growth-rate/hyperbolic equation
x
y = a1
b1 + x
Linearization: 1 1 b1 1
= +
y a1 a1 x
Equivalence:
1 1 b1 1
yˆ = A= B= x=
y a1 a1 x
Sample Problem
Given the following sets of data:
x y
0.50 1.90
1.00 1.50
1.30 1.20
1.60 1.00
2.00 0.80
2.20 0.78
2.50 0.65
3.10 0.46
3.90 0.30
4.40 0.23
Solve for the best-fit mathematical model.

Linear
x y x^2 y^2 xy
0.50 1.90 0.25 3.61 0.95
1.00 1.50 1.00 2.25 1.5
1.30 1.20 1.69 1.44 1.56
1.60 1.00 2.56 1.00 1.6
2.00 0.80 4.00 0.64 1.6
2.20 0.78 4.84 0.61 1.716
2.50 0.65 6.25 0.42 1.625
3.10 0.46 9.61 0.21 1.426
3.90 0.30 15.21 0.09 1.17
4.40 0.23 19.36 0.05 1.012
22.50 8.82 64.77 10.33 14.16
yˆ = 1.79 − 0.40 x r = −0.95

Exponential
X Y X^2 Y^2 XY
x y lny x^2 (lny)^2 xlny
0.50 1.90 0.641854 0.25 0.41 0.320927
1.00 1.50 0.405465 1.00 0.16 0.405465
1.30 1.20 0.182322 1.69 0.03 0.237018
1.60 1.00 0 2.56 0.00 0
2.00 0.80 -0.22314 4.00 0.05 -0.44629
2.20 0.78 -0.24846 4.84 0.06 -0.54661
2.50 0.65 -0.43078 6.25 0.19 -1.07696
3.10 0.46 -0.77653 9.61 0.60 -2.40724
3.90 0.30 -1.20397 15.21 1.45 -4.69549
4.40 0.23 -1.46968 19.36 2.16 -6.46657
22.50 8.82 -3.12 64.77 5.12 -14.68
yˆ = 2.47e −0.54 x r = −1.00

Power
X Y X^2 Y^2 XY
x y logx logy (logx)^2 (logy)^2 logxlogy
0.50 1.90 -0.30103 0.278754 0.090619 0.077704 -0.08391
1.00 1.50 0 0.176091 0 0.031008 0
1.30 1.20 0.113943 0.079181 0.012983 0.00627 0.009022
1.60 1.00 0.20412 0 0.041665 0 0
2.00 0.80 0.30103 -0.09691 0.090619 0.009392 -0.02917
2.20 0.78 0.342423 -0.10791 0.117253 0.011644 -0.03695
2.50 0.65 0.39794 -0.18709 0.158356 0.035001 -0.07445
3.10 0.46 0.491362 -0.33724 0.241436 0.113732 -0.16571
3.90 0.30 0.591065 -0.52288 0.349357 0.273402 -0.30906
4.40 0.23 0.643453 -0.63827 0.414031 0.407391 -0.4107
22.50 8.82 2.78 -1.36 1.52 0.97 -1.10
yˆ = 1.368 x −0.98 r = −0.95

Hyperbolic
X Y X^2 Y^2 XY
x y 1/x 1/y (1/x)^2 (1/y)^2 (1/x)(1/y)
0.50 1.90 2 0.526316 4 0.277008 1.052632
1.00 1.50 1 0.666667 1 0.444444 0.666667
1.30 1.20 0.769231 0.833333 0.591716 0.694444 0.641026
1.60 1.00 0.625 1 0.390625 1 0.625
2.00 0.80 0.5 1.25 0.25 1.5625 0.625
2.20 0.78 0.454545 1.282051 0.206612 1.643655 0.582751
2.50 0.65 0.4 1.538462 0.16 2.366864 0.615385
3.10 0.46 0.322581 2.173913 0.104058 4.725898 0.701262
3.90 0.30 0.25641 3.333333 0.065746 11.11111 0.854701
4.40 0.23 0.227273 4.347826 0.051653 18.90359 0.988142
22.50 8.82 6.56 16.95 6.82 42.73 7.35
0.374 x
yˆ = r = −0.63
x − 0.5576
Interpretation
Since exponential model’s r-value has the
nearest absolute value to 1 among the other
mathematical models, then it is considered to
be the best-fit curve for the system of data.

Regression and Correlation

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Regression and Correlation

Caricato da

Copyright:

Formati disponibili

Engineering Probability and

The required regression line equation is

1 68 D 7.5 8 -0.5 0.25

2 62 E 9 9.5 -0.5 0.25

3 60 E 10 9.5 0.5 0.25

4 99 A 1 1.5 -0.5 0.25

9 78 B 5.5 4 1.5 2.25

10 91 A 3 1.5 1.5 2.25

Solve for the best-fit mathematical model.

yˆ = 1.79 − 0.40 x r = −0.95

yˆ = 2.47e −0.54 x r = −1.00

yˆ = 1.368 x −0.98 r = −0.95

Potrebbero piacerti anche