Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract
While Bessel’s correction for the sample variance is well known and quoted abundantly in
statistics’ texts, a detailed treatment of why the correction is needed at all, does not appear
to be prominently mentioned, other than a cursory statement to the effect that the estimator
is biased without the correction. In this article, we present a mathematical treatment of the
‘uncorrected’ sample variance and explain why it is a biased estimator of the true variance of
a population. This will be of interest to readers who are studying or have studied statistics
but whom cannot find the real reason for Bessel’s correction. The reader is assumed to
have some prior knowledge of statistics and probability, particularly in relation to random
variables and the mathematical expectation operator.
1
Contents
1 Introduction 2
2 Mathematical notation 3
1 Introduction
The variance of a population σ 2 is an important second-order statistical measure since it gives
an indication of the spread of data around the population mean µ. Assuming that ith datum in
the population is represented as xi and the number of data in the entire population is Np , then
the population variance is defined as:
Np
2 1 X
σ = (xi − µ)2 (1)
Np
i=1
It may be impractical to calculate the population variance directly, perhaps due to Np being
very large or due to the data of the entire population being unavailable. In which case, we may
only be given a smaller subset of the population, i.e. N samples (where N < Np ). Given the
N samples of the population, we may estimate the population mean and variance by using the
same expressions:
N
1 X
µ̂ = xi (3)
N
i=1
N
1 X
σ̂ 2 = (xi − µ̂)2 (4)
N
i=1
However, as quoted by many statistics’ texts, the ‘correct’ sample variance should be estimated
instead as:
N
1 X
σ̂ 2 = (xi − µ̂)2 (5)
N −1
i=1
The modification from N1 to N 1−1 is called Bessel’s correction. The reason that is usually given
for why one should use Eq. (5) and not (4) is because the latter estimate is biased. The problem
here is that not much more is given to the reader as to why Eq. (4) is considered biased.
This article will attempt to give a mathematical treatment of the sample variance as ex-
pressed in Eq. (4). The structure of the article is as follows. The next section will deal with the
concept of bias in estimators when dealing with random variables. In Section, we present the
mathematical treatment by finding the expected value of the biased sample variance estimator
and showing that it is not equal to the population variance.
2
2 Mathematical notation
As this article will present several mathematical derivations, it is particularly helpful to present
the variable notation that will be used.
X random variable
xi ith sample or realisation of the random variable X
E{•} expectation operator
µ population mean
σ2 population variance
µ̂ sample mean
σˆ2 sample variance
B = E{µ̂} − µ (7)
Therefore, the aim of this paper is to show that the average or expected value of the sample
variance of (4) is not equal to the true population variance:
E{σˆ2 } =
6 σ2 (8)
3
N
( )
2 1 X
E{σ̂X } =E (xi − µ̂X )2 (9)
N
i=1
N
( )
1 X
=E (x2i − 2xi µ̂X + µ̂2X ) (10)
N
i=1
N N
( )
1 X 1 X
= E x2i − 2µ̂X xi + µ̂2X (11)
N N
i=1 i=1
N
( )
1 X 2
= E xi − µ̂2X (12)
N
i=1 !2
1 X N N
1 X
= E x2i − xi (13)
N N
i=1 i=1
1 X N N N X N
2 1 X
2
X
= E xi − 2 xi + xi xj (14)
N i=1
N
i=1 i=1 j=1
i6=j
N N N X
N
1 X 1 X X
= E{x2i } − 2 E{x2i } + E{xi xj } (15)
N N
i=1 i=1 i=1 j=1
i6=j
N N N N
1 X 1 X XX
= E{x2i } − 2 E{x2i } + µ2X (16)
N N
i=1 i=1 i=1 j=1
i6=j
N N
!
1 X 1 X
= E{x2i } − 2 E{x2i } + (N 2 − N )µ2X (17)
N N
i=1 i=1
N
N −1X N −1 2
= E{x2i } − µX (18)
N2 N
i=1
N −1 N −1 2
= E{x2i } − µX (19)
N N
N −1 2 N −1 2
= (σX + µ2X ) − µX (20)
N N
N −1 2
= σX (21)
N
2
6= σX (22)
Therefore, the sample variance without Bessel’s correction is a biased estimator of the pop-
ulation variance. Note that we have used the following expressions to simplify (15) and (19),
respectively: