Sei sulla pagina 1di 27

Engineering Probability and

Statistics
Regression and Correlation
Introduction
Linear Regression
• Given a pair of data, a regression equation can
be obtained
• Regression equation may be of degree one
(linear), two (quadratic), three (cubic) or
higher (nth order polynomial)
Linear regression
From analytic geometry:
ŷ = A + Bx
where the coefficients of A and B represent the
y-intercept and the slope respectively.
The symbol ŷ is used to distinguish between the
predicted value given by the regression line and
an actual observed value y for some value of x.
Sample Data
Student Proficiency Exam Course
Number Score Grade

1 60 70

2 90 95

3 70 70

4 85 75

5 80 90

6 65 75

7 75 75

8 60 60

9 75 80

10 70 65
Scatter Diagram and the Best-fit Line
Proficiency Exam and Course Grade of 10 Students
100

90

80

70
Course Grade

60

50

40

30

20

10

0
0 10 20 30 40 50 60 70 80 90 100
Proficiency Exam Score
Method of Least Squares
• The least-squares procedure selects that
particular line for which the sum of the
squares of the vertical distances from the
observed points to the line is as small as
possible.
• Normal equations:
nA + B ∑ x = ∑ y
A∑ x + B ∑ x 2 = ∑ xy
Regression Coefficients

n∑ xy − ∑ x ∑ y
B=
n∑ x − ( ∑ x )
2 2

A = y − Bx
Sample Calculation
x y xy x^2 y^2
1 60 70 4200 3600 4900
2 90 95 8550 8100 9025
3 70 70 4900 4900 4900
4 85 75 6375 7225 5625
5 80 90 7200 6400 8100
6 65 75 4875 4225 5625
7 75 75 5625 5625 5625
8 60 60 3600 3600 3600
9 75 80 6000 5625 6400
10 70 65 4550 4900 4225
Totals 730 755 55875 54200 58025
Sample Calculation
10(55875) − (730)(755)
B=
10(54200) − (730) 2
B = 0.84

A=
∑y −B∑x
n n
A = 75.5 − 61.32
A = 14.18

The required regression line equation is


yˆ = 14.18 + 0.84 x
Sample Calculation – Application
Hence, the best estimate of the course grade of
a student who obtains a proficiency exam score
of 88 is

yˆ = 14.18 + 0.84(88)
yˆ = 88.1
Correlation Analysis
• When two variables are mathematically
associated, they are said to be correlated
• Correlation Analysis measures the degree of
relationship between the two variables, x and
y, by means of a single number called the
correlation coefficient, r.
Coefficient of Correlation
• Has the range of values: - 1 ≤ r ≤ 1
• If r is negative, there is an inverse relationship
between x and y, i.e., if x is increasing then y is
decreasing or vice versa.
• If r is positive, there is a direct relationship
between x and y, i.e., if x is increasing then y is
increasing or vice versa.
• If r = 0, then the two sets of data are
uncorrelated (No Correlation).
Pearson Product-Moment Correlation
Coefficient

n∑ xy − ∑ x ∑ y
r=
n x − ( x )  n y − ( y ) 
 ∑ ∑   ∑ ∑ 
2 2 2 2
Sample Calculation
Using the same sets of data,

r = 0.787882
Rank Correlation Coefficient
• A nonparametric measure of association
between two variables x and y is given by the
Spearman Rank Correlation Coefficient
6∑ d 2

r = 1−
n ( n − 1)
2

where
d = difference in ranking for each pair
n = number of pairs of data
Sample Problem
Consider the following sets of data:
Student Final Grade Extra-Curricular
Number Average Performance
1 68 D
2 62 E
3 60 E
4 99 A
5 68 C
6 78 C
7 98 B
8 84 B
9 78 B
10 91 A
Sample Calculation
x y xr yr d d^2

1 68 D 7.5 8 -0.5 0.25

2 62 E 9 9.5 -0.5 0.25

3 60 E 10 9.5 0.5 0.25

4 99 A 1 1.5 -0.5 0.25

5 68 C 7.5 6.5 1 1

6 78 C 5.5 6.5 -1 1

7 98 B 2 4 -2 4

8 84 B 4 4 0 0

9 78 B 5.5 4 1.5 2.25

10 91 A 3 1.5 1.5 2.25

Totals 11.5
Sample Calculation
The value of coefficient of correlation is

r = 0.930303
Curve-Fitting:
Other Nonlinear Relationships
1. Exponential Model

y = a1e b1 x

Linearization:
ln y = ln a1 + b1 x
Equivalence:
yˆ = ln y A = ln a1 B = b1 x=x
Curve-Fitting:
Other Nonlinear Relationships
2. Simple Power Equation

y = a1 x b1

Linearization:
log y = log a1 + b1 log x
Equivalence:
yˆ = log y A = log a1 x = log x B = b1
Curve-Fitting:
Other Nonlinear Relationships
3. Saturation-growth-rate/hyperbolic equation
x
y = a1
b1 + x
Linearization: 1 1 b1 1
= +
y a1 a1 x
Equivalence:
1 1 b1 1
yˆ = A= B= x=
y a1 a1 x
Sample Problem
Given the following sets of data:
x y
0.50 1.90
1.00 1.50
1.30 1.20
1.60 1.00
2.00 0.80
2.20 0.78
2.50 0.65
3.10 0.46
3.90 0.30
4.40 0.23

Solve for the best-fit mathematical model.


Linear
x y x^2 y^2 xy
0.50 1.90 0.25 3.61 0.95
1.00 1.50 1.00 2.25 1.5
1.30 1.20 1.69 1.44 1.56
1.60 1.00 2.56 1.00 1.6
2.00 0.80 4.00 0.64 1.6
2.20 0.78 4.84 0.61 1.716
2.50 0.65 6.25 0.42 1.625
3.10 0.46 9.61 0.21 1.426
3.90 0.30 15.21 0.09 1.17
4.40 0.23 19.36 0.05 1.012
22.50 8.82 64.77 10.33 14.16

yˆ = 1.79 − 0.40 x r = −0.95


Exponential
X Y X^2 Y^2 XY
x y lny x^2 (lny)^2 xlny
0.50 1.90 0.641854 0.25 0.41 0.320927
1.00 1.50 0.405465 1.00 0.16 0.405465
1.30 1.20 0.182322 1.69 0.03 0.237018
1.60 1.00 0 2.56 0.00 0
2.00 0.80 -0.22314 4.00 0.05 -0.44629
2.20 0.78 -0.24846 4.84 0.06 -0.54661
2.50 0.65 -0.43078 6.25 0.19 -1.07696
3.10 0.46 -0.77653 9.61 0.60 -2.40724
3.90 0.30 -1.20397 15.21 1.45 -4.69549
4.40 0.23 -1.46968 19.36 2.16 -6.46657
22.50 8.82 -3.12 64.77 5.12 -14.68

yˆ = 2.47e −0.54 x r = −1.00


Power
X Y X^2 Y^2 XY
x y logx logy (logx)^2 (logy)^2 logxlogy
0.50 1.90 -0.30103 0.278754 0.090619 0.077704 -0.08391
1.00 1.50 0 0.176091 0 0.031008 0
1.30 1.20 0.113943 0.079181 0.012983 0.00627 0.009022
1.60 1.00 0.20412 0 0.041665 0 0
2.00 0.80 0.30103 -0.09691 0.090619 0.009392 -0.02917
2.20 0.78 0.342423 -0.10791 0.117253 0.011644 -0.03695
2.50 0.65 0.39794 -0.18709 0.158356 0.035001 -0.07445
3.10 0.46 0.491362 -0.33724 0.241436 0.113732 -0.16571
3.90 0.30 0.591065 -0.52288 0.349357 0.273402 -0.30906
4.40 0.23 0.643453 -0.63827 0.414031 0.407391 -0.4107
22.50 8.82 2.78 -1.36 1.52 0.97 -1.10

yˆ = 1.368 x −0.98 r = −0.95


Hyperbolic
X Y X^2 Y^2 XY
x y 1/x 1/y (1/x)^2 (1/y)^2 (1/x)(1/y)
0.50 1.90 2 0.526316 4 0.277008 1.052632
1.00 1.50 1 0.666667 1 0.444444 0.666667
1.30 1.20 0.769231 0.833333 0.591716 0.694444 0.641026
1.60 1.00 0.625 1 0.390625 1 0.625
2.00 0.80 0.5 1.25 0.25 1.5625 0.625
2.20 0.78 0.454545 1.282051 0.206612 1.643655 0.582751
2.50 0.65 0.4 1.538462 0.16 2.366864 0.615385
3.10 0.46 0.322581 2.173913 0.104058 4.725898 0.701262
3.90 0.30 0.25641 3.333333 0.065746 11.11111 0.854701
4.40 0.23 0.227273 4.347826 0.051653 18.90359 0.988142
22.50 8.82 6.56 16.95 6.82 42.73 7.35
0.374 x
yˆ = r = −0.63
x − 0.5576
Interpretation
Since exponential model’s r-value has the
nearest absolute value to 1 among the other
mathematical models, then it is considered to
be the best-fit curve for the system of data.

Potrebbero piacerti anche