Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Statistics
Regression and Correlation
Introduction
Linear Regression
• Given a pair of data, a regression equation can
be obtained
• Regression equation may be of degree one
(linear), two (quadratic), three (cubic) or
higher (nth order polynomial)
Linear regression
From analytic geometry:
ŷ = A + Bx
where the coefficients of A and B represent the
y-intercept and the slope respectively.
The symbol ŷ is used to distinguish between the
predicted value given by the regression line and
an actual observed value y for some value of x.
Sample Data
Student Proficiency Exam Course
Number Score Grade
1 60 70
2 90 95
3 70 70
4 85 75
5 80 90
6 65 75
7 75 75
8 60 60
9 75 80
10 70 65
Scatter Diagram and the Best-fit Line
Proficiency Exam and Course Grade of 10 Students
100
90
80
70
Course Grade
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Proficiency Exam Score
Method of Least Squares
• The least-squares procedure selects that
particular line for which the sum of the
squares of the vertical distances from the
observed points to the line is as small as
possible.
• Normal equations:
nA + B ∑ x = ∑ y
A∑ x + B ∑ x 2 = ∑ xy
Regression Coefficients
n∑ xy − ∑ x ∑ y
B=
n∑ x − ( ∑ x )
2 2
A = y − Bx
Sample Calculation
x y xy x^2 y^2
1 60 70 4200 3600 4900
2 90 95 8550 8100 9025
3 70 70 4900 4900 4900
4 85 75 6375 7225 5625
5 80 90 7200 6400 8100
6 65 75 4875 4225 5625
7 75 75 5625 5625 5625
8 60 60 3600 3600 3600
9 75 80 6000 5625 6400
10 70 65 4550 4900 4225
Totals 730 755 55875 54200 58025
Sample Calculation
10(55875) − (730)(755)
B=
10(54200) − (730) 2
B = 0.84
A=
∑y −B∑x
n n
A = 75.5 − 61.32
A = 14.18
yˆ = 14.18 + 0.84(88)
yˆ = 88.1
Correlation Analysis
• When two variables are mathematically
associated, they are said to be correlated
• Correlation Analysis measures the degree of
relationship between the two variables, x and
y, by means of a single number called the
correlation coefficient, r.
Coefficient of Correlation
• Has the range of values: - 1 ≤ r ≤ 1
• If r is negative, there is an inverse relationship
between x and y, i.e., if x is increasing then y is
decreasing or vice versa.
• If r is positive, there is a direct relationship
between x and y, i.e., if x is increasing then y is
increasing or vice versa.
• If r = 0, then the two sets of data are
uncorrelated (No Correlation).
Pearson Product-Moment Correlation
Coefficient
n∑ xy − ∑ x ∑ y
r=
n x − ( x ) n y − ( y )
∑ ∑ ∑ ∑
2 2 2 2
Sample Calculation
Using the same sets of data,
r = 0.787882
Rank Correlation Coefficient
• A nonparametric measure of association
between two variables x and y is given by the
Spearman Rank Correlation Coefficient
6∑ d 2
r = 1−
n ( n − 1)
2
where
d = difference in ranking for each pair
n = number of pairs of data
Sample Problem
Consider the following sets of data:
Student Final Grade Extra-Curricular
Number Average Performance
1 68 D
2 62 E
3 60 E
4 99 A
5 68 C
6 78 C
7 98 B
8 84 B
9 78 B
10 91 A
Sample Calculation
x y xr yr d d^2
5 68 C 7.5 6.5 1 1
6 78 C 5.5 6.5 -1 1
7 98 B 2 4 -2 4
8 84 B 4 4 0 0
Totals 11.5
Sample Calculation
The value of coefficient of correlation is
r = 0.930303
Curve-Fitting:
Other Nonlinear Relationships
1. Exponential Model
y = a1e b1 x
Linearization:
ln y = ln a1 + b1 x
Equivalence:
yˆ = ln y A = ln a1 B = b1 x=x
Curve-Fitting:
Other Nonlinear Relationships
2. Simple Power Equation
y = a1 x b1
Linearization:
log y = log a1 + b1 log x
Equivalence:
yˆ = log y A = log a1 x = log x B = b1
Curve-Fitting:
Other Nonlinear Relationships
3. Saturation-growth-rate/hyperbolic equation
x
y = a1
b1 + x
Linearization: 1 1 b1 1
= +
y a1 a1 x
Equivalence:
1 1 b1 1
yˆ = A= B= x=
y a1 a1 x
Sample Problem
Given the following sets of data:
x y
0.50 1.90
1.00 1.50
1.30 1.20
1.60 1.00
2.00 0.80
2.20 0.78
2.50 0.65
3.10 0.46
3.90 0.30
4.40 0.23