Definition 4.5.1: The Covariance of and Is The Number Defined by

4.
5 Covariance and Correlation
In earlier sections, we have discussed the absence or presence of a relationship between two
random variables, independence or nonindependence. But If there is a relationship, the
relationship may be strong or weak. In this section we discuss two numerical measures of the
strength of a relationship between two random variables, the covariance and correlation.
To illustrate what we mean by the strength of a relationship between two random variables,
consider two different experiments. In the first, random variables 𝑋 and 𝑌 are measured, where 𝑋
is the weight of a sample of water and 𝑌 is the volume of the same sample of water. Clearly there
is a strong relationship between 𝑋 and 𝑌. If (𝑋, 𝑌) pairs are measured on several samples and the
observed data pairs are plotted, the data points should fall on a straight line because of
measurement errors, impurities in the water, etc. But with careful laboratory technique, the data
points will fall very nearly on a straight line. Now consider another experiment in which 𝑋 and 𝑌
are measured, where 𝑋 is the body weight of a human and 𝑌 is the same human’s height. Clearly
there is also a relationship between 𝑋 and 𝑌 here but the relationship is not nearly as strong. We
would not expect a plot of (𝑋, 𝑌) pairs measured on different people to form a straight line,
although we might expect to see an upward trend in the plot. The covariance and correlation are
two measures that quantify this difference in the strength of a relationship between two random
variables.
Throughout this section we will frequently be referring to the mean and variance of 𝑋 and
the mean and variance of 𝑌. For these we will use the notation 𝐸𝑋 = 𝜇𝑥, E𝑌 = 𝜇𝑦, Var 𝑋 = 𝜎𝑥2 ,
and Var 𝑌 = 𝜎𝑌2 . We will assume throughout that 0 < 𝜎𝑥2 < ∞ and 0 < 𝜎𝑦2 < ∞.
Definition 4.5.1: The covariance of 𝑋 and 𝑌 is the number defined by
Cov(𝑋, 𝑌) = E((𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦)).
Definition 4.5.2: The correlation of 𝑋 and 𝑌 is the number defined by

Cov(𝑋, 𝑌)
𝜌𝑥𝑦 =
𝜎𝑥 𝜎𝑦
The value 𝜌𝑥𝑦 is also called the correlation coefficient.
If large values of 𝑋 tend to be observed with large values of 𝑌 and small values of 𝑋 with small
values of 𝑌, then Cov(𝑋, 𝑌) will be positive. If 𝑋 > 𝜇𝑥, then 𝑌 > 𝜇𝑦 is likely to be true and the
product (𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦) will be positive. If 𝑋 < 𝜇𝑥, then 𝑌 < 𝜇𝑦 is likely to be true and the
product (𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦) will again be positive. Thus Cov(𝑋, 𝑌) = E(𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦) > 0. If
large values of 𝑋 tend to be observed with small values of 𝑌 and small values of 𝑋 with large
values of 𝑌, then Cov(𝑋, 𝑌) will be negative because when 𝑋 > 𝜇𝑥, 𝑌 will tend to be less than 𝜇𝑦
and vice versa, and hence (𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦) will tend to be negative. Thus the sign of Cov(𝑋, 𝑌)
gives information regarding the relationship between 𝑋 and 𝑌.
But Cov(𝑋, 𝑌) can be any number and a given value of Cov(𝑋, 𝑌), say Cov(𝑋, 𝑌) = 3, does not in
itself give information about the strength of the relationship between 𝑋 and 𝑌. On the other hand,
the correlation is always between −1 and 1, with the values −1 and 1 indicating a perfect linear
relationship between 𝑋 and 𝑌. This is proved in Theorem 4.5.7.
Before investigating these properties of covariance and correlation, we will fist calculate these
measures in a given example. This calculation will be simplified by the following result.
Theorem 4.5.3 For any random variables 𝑋 and 𝑌,
Cov(𝑋, 𝑌) = E𝑋𝑌 − 𝜇𝑥𝜇𝑦.

Proof: Cov(𝑋, 𝑌) = E(𝑋 − 𝜇𝑥)𝑌 − 𝜇𝑦)
= E(𝑋𝑌 − 𝜇𝑥𝑌 − 𝜇𝑦𝑋 + 𝜇𝑥𝜇𝑦 (expanding the product)
= E𝑋𝑌 − 𝜇𝑥E𝑌 − 𝜇𝑦E𝑋 + 𝜇𝑥𝜇𝑦 (𝜇𝑥 and 𝜇𝑦 are constants)
= E𝑋𝑌 − 𝜇𝑥𝜇𝑦 − 𝜇𝑦𝜇𝑥 + 𝜇𝑥𝜇𝑦

= E𝑋𝑌 − 𝜇𝑥𝜇𝑦
Example 4.5.4 (Correlation-I) Let the joint pdf of (𝑋, 𝑌) be 𝑓(𝑥, 𝑦) = 1, 0 < 𝑥 < 1, 𝑥 < 𝑦 < 𝑥 +
1 1
1. See Figure 4.5.1. The marginal distribution of 𝑋 is uniform(0, 1) so 𝜇𝑥 = 2 and 𝜎𝑋2 = 12. The
marginal pdf of 𝑌 is 𝑓𝑌(𝑦) = 𝑦, 0 < 𝑦 < 1, and 𝑓𝑌(𝑦) = 2 − 𝑦, 1 ≤ 𝑦 ≤ 2 , with 𝜇𝑦 = 1 and
1
𝜎𝑌2 = . We also have
6

Definition 4.5.1: The Covariance of and Is The Number Defined by

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Definition 4.5.1: The Covariance of and Is The Number Defined by

Caricato da

Copyright:

Formati disponibili

4.

5 Covariance and Correlation

Definition 4.5.1: The covariance of 𝑋 and 𝑌 is the number defined by

Cov(𝑋, 𝑌) = E((𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦)).

Definition 4.5.2: The correlation of 𝑋 and 𝑌 is the number defined by

The value 𝜌𝑥𝑦 is also called the correlation coefficient.

Theorem 4.5.3 For any random variables 𝑋 and 𝑌,

Cov(𝑋, 𝑌) = E𝑋𝑌 − 𝜇𝑥𝜇𝑦.

= E(𝑋𝑌 − 𝜇𝑥𝑌 − 𝜇𝑦𝑋 + 𝜇𝑥𝜇𝑦 (expanding the product)

= E𝑋𝑌 − 𝜇𝑥E𝑌 − 𝜇𝑦E𝑋 + 𝜇𝑥𝜇𝑦 (𝜇𝑥 and 𝜇𝑦 are constants)

= E𝑋𝑌 − 𝜇𝑥𝜇𝑦 − 𝜇𝑦𝜇𝑥 + 𝜇𝑥𝜇𝑦

Potrebbero piacerti anche