Sei sulla pagina 1di 4

Pearson Correlation Coefficient

In statistics, the Pearson correlation coefficient (PCC, pronounced /


ˈpɪərsən/), also referred to as Pearson's r, the Pearson product-moment correlation
coefficient (PPMCC) or the bivariate correlation, is a measure of the linear
correlation between two variables X and Y. Owing to the Cauchy–Schwarz inequality
it has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no
linear correlation, and −1 is total negative linear correlation. It is widely used in the
sciences. It was developed by Karl Pearson from a related idea introduced by Francis
Galton in the 1880s.

Definition:

Pearson's correlation coefficient is the covariance of the two variables


divided by the product of their standard deviations. The form of the definition
involves a "product moment", that is, the mean (the first moment about the origin)
of the product of the mean-adjusted random variables; hence the modifier product-
moment in the name.

For a population

Pearson's correlation coefficient when applied to a population is commonly


represented by the Greek letter ρ (rho) and may be referred to as the population
correlation coefficient or the population Pearson correlation coefficient. The
formula for ρ is:

cov ( X , Y )
ρX=
σXσY

where:

 cov is the covariance


 σ X is the standard deviation of X
 σ Y is the standard deviation of Y

The formula for ρ can be expressed in terms of mean and expectation. Since

cov ( X ,Y )=E ¿(Y −μY ) ¿ ,

the formula for ρ can also be written as

E ¿(Y −μY )
ρ X ,Y =¿ ¿
σ X σY
where:

 cov and σ X are defined as above


 μ X is the mean of X
 μY is the mean of Y
 E is the expectation

The formula ρ can also be expressed in terms of uncentered moments.


Since

 μ X =E[ X ]
 μY =E[Y ]
2 2
X −E [ X ] =E [ X ]−[ E [ X ] ]
 ¿
σ 2X =E ¿
2 2
X −E[Y ]=E [Y ]−[ E [ Y ] ]
 ¿
σ 2Y =E ¿
 E[ ( X −μ X )(Y −μ Y )¿=E [ ( X−E [ X ] ) (Y −E [ Y ] )]=E [ XY ]−E [ X ]E [Y ],

written as for ρ can also be written as

[X]
E¿
¿
¿2
E [ X 2 ] −¿
√¿
E [ XY ] −E [ X ][Y ]
ρ XY =
¿
For a sample

Pearson's correlation coefficient when applied to a sample is commonly


represented by the letter r and may be referred to as the sample correlation
coefficient or the sample Pearson correlation coefficient. We can obtain a formula
for r by substituting estimates of the covariances and variances based on a sample
into the formula above. So if we have one dataset {x 1,. .. , xn } containing n values
and another dataset { y 1,... , yn } containing n values then that formula for r is:
n

∑ ( x i−x́)( y i− ý )
i=1
r=

√ √
n n

∑ ( x i− x́ ) ∑ ( y i− ý )2
2

i=1 i=1
where:

 n is the sample size


 x i , y i are the individual sample points indexed with i
n
x́=∑ x 1
i=1

(the sample mean); and analogously for ý

Rearranging gives us this formula for r .

n ∑ xi yi ∑ x i ∑ yi
r=r xy =
√ n ∑ x −( ∑ x ) √ n ∑ y ❑ − ( ∑ y )
2
i i
2 2
i i
2

where:

 n , xi , y i are defined as above

Example:

An agricultural research organization tested a particular chemical fertilizer to


try to find out whether an increase in the amount of fertilizer used would lead to a
corresponding increase in the food supply.

Fertilizer(lbs) 2 1 3 2 4 5 3 X
Bushels of beans 4 3 4 3 6 5 5 Y

x y xy x
2
y
2

2 4 8 4 16
1 3 3 1 9
3 4 12 9 16
2 3 6 4 9
4 6 24 16 36
5 5 25 25 25
3 5 15 9 25

Solution:

n ( ∑ xy ) −( ∑ x ) (∑ y )
r=
√ [ n ( ∑ x )−( ∑ x ) ][ n ( ∑ y )−(∑ y ) ]
2 2 2 2
7 ( 93 )−(20)(30)
r=
√ [ 7 ( 68 )− ( 20 ) ][ 7 ( 136 )−( 30 ) ]
2 2

651−6 00
r=
√ [ 476−400 ][ 952−900 ]
51 51 51
r= = = =0.811
√ [ 76 ][ 52 ] √3952 62.8649

Potrebbero piacerti anche