1 Preliminaries: 1.1 Motivation

1 Preliminaries
1.1 Motivation
We will be dealing with measurements of several variables for each of n experimental units or individuals.
The variables are of two types (though the distinction between them is not always rigid in applications):
those of primary interest to the investigator and those which might provide supplementary or background
information. The variables of the former type are called response, outcome or dependent variables, while
those of latter type are called explanatory, independent or predictor variables. Econometricians also use the
terms endogenous and exogenous to distinguish the two types of variables. The explanatory variables are
used to predict or to understand the response variables.
We distinguish between a functional relation and a statistical relation. The functional relation between
the explanatory variable X and the response variable Y is often expressed as a mathematical formula
Y = g(X)
and the main feature of this relation is that the observations (xi , yi ) (i = 1, . . . , n) fall directly on the “curve”
of the relationship, that is, on the curve y = g(x).
A statistical relation, unlike a functional relation is not a “perfect”one. Given a random nature of the
variables involved, we will assume that
Y = g(X) + ε,
where g is the regression function or in this case the systematic component, and ε is the random component
(the error term). In most applications ε is a normal random variable with mean zero and variance σ 2
(ε ∼ N (0, σ 2 )).
The systematic component g is often expressed in terms of explanatory variables through a parametric
equation. If, for example, it is supposed that
g(x) = A + Bx + Cx2
or
g(x) = A2x + B
or
g(x) = A log x + B,
then the problem is reduced to one of identifying a few parameters, here labeled as A, B, C. In each of these
three forms for g given above, g is linear in these parameters. It is the linearity in the parameters which
makes the model a linear statistical model
1.2 The model of measurements

Let µ be an unknown quantity of interest which can be measured with some error. A mathematical (statis-
tical) model for this experiment is specified by the following equation (the model equation)
Y = µ + ε,
where Y is the available measurement (observation) and ε is a random error modelled as a normally dis-
tributed random variable with zero mean and variance σ 2 , i.e. ε ∼ N (0, σ 2 ). By properties of the normal
distribution, we have that Y ∼ N (µ, σ 2 ). Suppose that we have n measurements
Y i = µ + εi ,
where εi ∼ N (0, σ 2 ) (i = 1, ..., n,) are independent. It follows that Y1 , ..., Yn are independent random
variables with Yi ∼ N (µ, σ 2 ). In other words, Y1 , ..., Yn is a random sample from a normally distributed
1
population with mean µ and variance σ 2 , so that the problem of estimating an unknown quantity µ is the
well known (from the 1st Year statistics) problem of estimating a population mean of a normal population.
The sample mean
n
1 1X
Ȳ = (Y1 + ... + Yn ) = Yi
n n
i=1
is usually used as a point estimator of µ. In MT130 we only briefly mentioned that there are some general
methods of obtaining point estimators in statistics. In this course we are going to use one of these methods,
namely, the method of the least squares (LS). To demonstrate the main idea of this method, let us consider
the case of the model of measurements.
Given observations Y1 , . . . , Yn define the following function
n
X
S(µ) = (Yi − µ)2 .
i=1
The value of µ that minimises S(µ) is called the least square estimator of µ. We can find the point of
minimum of S(µ), by equateing to zero the first derivative of S(µ)
n n
!
X X
S 0 (µ) = −2 (Yi − µ) = −2 Yi − nµ = 0.
i=1 i=1
Now it is easy to see that the solution of the above equation is Ȳ = n1 ni=1 Yi , the sample mean, and this
P
is the point of minimum, as the second derivative of S is 2n > 0.
There is also a direct way to see that the sample mean is the point of minimum, and, hence, the least
square estimator of µ. Indeed,
n
X n
X
2 2 2 2 2
S(µ) = (Yi − 2Yi µ + µ ) = −2nµȲ + nµ + nȲ − nȲ + Yi2
i=1 i=1
n n
!
X X
= n (µ − Ȳ )2 + Yi2 − nȲ 2 ≥ Yi2 − nȲ 2 ,
i=1 i=1
where inequality becomes equality if and only if µ = Ȳ .

Note finally that
n
X
S Ȳ = Yi2 − nȲ 2 = (n − 1)s2 ,
i=1
where s2 is the sample variance, which is the point estimator of another model parameter σ 2 (see the next
section).
1.3 Short revision of 1st Year statistics

The process of making statements about population parameters based on the information contained in a
sample is known as parametric statistical inference.
Example 1. A mechanical jar filler for filling jars with coffee does not fill every jar with the same quantity.
The weight of coffee Y filled in a jar is a random variable which can be assumed to be normally distributed
with mean value µ and variance σ 2 (Y ∼ N (µ, σ 2 )). Suppose that we have a sample of n independent
measurements on Y and wish to “identify” the parameters of the population (µ, σ 2 ).
The sort of statements that we wish to make about parameters will often fall into one of the following
three categories:
2
• Point estimation;
• Interval estimation;
• Hypotheses testing.
1.3.1 Point estimation

The point estimation is the aspect of statistical inference in which we wish to find the ”the best guess” of
the true value of a population parameter.
Suppose that Y1 , Y2 , · · · , Yn is a sample of size n. Then an estimator of an unknown parameter θ is some
function of the observations Y1 , Y2 , · · · , Yn , that is
θ̂ = θ̂(Y1 , Y2 , · · · , Yn )
(which is in some sense a ”good approximation” to the unknown parameter θ).

A point estimator of µ in a N (µ, σ 2 ) population is provided by the sample mean Ȳ , which is defined by
n
1X 1
Ȳ = Yi = (Y1 + Y2 + · · · + Yn ).
n n
i=1
To estimate σ 2 in a N (µ, σ 2 ) population we generally use as its estimator the sample variance s2 defined by
n n
!
1 X 1 X
s2 = (Yi − Ȳ )2 = Yi2 − nȲ 2 .
n−1 n−1
i=1 i=1
Properties of Estimators. Let θ̂ = θ̂(Y1 , Y2 , · · · , Yn ) be an estimator of an unknown parameter θ. To

clarify in what sense θ̂ is a ”good approximation” to θ we consider estimators which are (1) unbiased and
(2) consistent.
(1) θ̂ is said to be an unbiased estimator of θ if E(θ̂) = θ.
In the example, Ȳ is an unbiased estimator of µ and s2 is an unbiased estimator of σ 2 .
To check whether we have a sensible estimator we need to ensure that θ̂ is increasingly likely to yield
the right answer θ as the sample size n gets bigger. The mean square error (MSE) of θ̂ is defined to be
E(θ̂ − θ)2 . Since the MSE of θ̂ is the average square distance of θ̂ from the true value θ, a good estimator
is one with a small MSE.
(2) θ̂ is said to be a consistent estimator of θ if
MSE(θ̂) → 0 as n → ∞.
Note that if θ̂ is unbiased then it is also consistent if Var(θ̂) → 0.

In the example, Ȳ is an unbiased and consistent estimator of µ, and s2 is unbiased estimator of σ 2 .
1.3.2 Interval estimation

We will often need to find a range of values within which we are “almost certain” that the true parameter
values lie. Such a range of values is known as a confidence interval (C.I.) for the unknown parameter.
In the example, if σ is known, then to construct a confidence interval for µ recall that Ȳ is a linear com-
bination of independent N (µ, σ 2 ) random variables (Ȳ = ni=1 Yi /n) and therefore is normally distributed
P
3
with mean µ (unbiased) and variance σ 2 /n, that is, Ȳ ∼ N (µ, σ 2 /n). It therefore follows that if σ 2 is known,
then
Ȳ − µ
Z= √ ∼ N (0, 1)
σ/ n
This fact yields that given 0 < α < 1

σ σ
P Ȳ − zα/2 √ ≤ µ ≤ Ȳ + zα/2 √ = 1 − α,
n n
so that, if σ is known, then we have the following (1 − α)100% confidence interval for µ

σ σ
Ȳ − zα/2 √ , Ȳ + zα/2 √ .
n n
If σ is unknown, then
Ȳ − µ
t= √ ∼ tn−1 ,
s/ n
where tn−1 is a random variable with t−distribution with n − 1 degrees of freedom, and (1 − α)100%
confidence interval for µ
s s
Ȳ − tn−1,α/2 √ , Ȳ + tn−1,α/2 √ .
n n
Example 2. Jars of coffee are labeled as 484 grams in weight. A random sample of ten jars from a
production line are opened and weighed accurately. Suppose the results Y1 , ..., Y10 gave
Y1 + · · · + Y10 = 4847.5, Y12 + · · · + Y10

2
= 2349850.
Find the 95%CI for the true population mean of jar weights.
Using the information provided we find

1
Ȳ = (Y1 + ... + Y10 ) = 484.75
10
and
10
!
1 X
s2 = Yi2 − 10(Ȳ )2 = 3.24
9
i=1
Therefore, the 95% CI for the population mean is

r
s2
484.75 ± t9;0.025 = (483.462, 486.038),
n
where the critical value t9;0.025 = 2.262 is found in Tables.
1.3.3 Hypotheses testing

Often an investigator has a theory about the phenomenon under study, and wishes to see whether this
theory is confirmed by the data that have been collected. The null hypothesis H0 is, usually, what we are
prepared to “go along with” until we obtain convincing evidence in favour of the alternative hypothesis H1 .
To conduct a hypothesis test we need to complete the following steps.
(1) Specify the null and alternative hypotheses.
4
(2) Choose a test statistic T which is such that
◦ T behaves differently under the null and alternative hypotheses;
◦ the sampling distribution of T is fully specified when H0 is true.
(3) Formulate some decision rule based on the statistic T .
Whatever decision rule is adopted, there is some chance of reaching an erroneous conclusion about the
population parameter of interest. One error that could be made, called a Type I error, is the rejection of a
true null hypothesis. If the decision rule is such that the probability of rejecting of a true null hypothesis
is α, then α is said to be the significance level of the test. The other possible error, called Type II error,
arises when a false null hypothesis is accepted. Suppose that for a particular decision rule, the probability
of making such an error is β. Then, the probability of rejecting a false null hypothesis is (1 − β), which is
called the power of the test.
NULL HYPOTHESIS NULL HYPOTHESIS

TRUE FALSE
ACCEPT Correct decision Type II error

Probability =1 − α Probability = β
REJECT Type I error Correct decision

Probability = α Probability =1 − β
(significance level) (power)
Ideally we would like to have the probabilities of both types of error be as small as possible. However,
in general, once a sample has been taken, any adjustment to the decision rule to reduce the probability α
of type I error automatically increases the probability β of type II error. The only way of simultaneously
lowering both α and β would be to obtain more information about the population, e.g., by taking a larger
sample. In practice we usually specify significance level (type I error) α to have a small value such as 0.10,
0.05, 0.025, or 0.01. This then determines the probability of Type II error β (if there is a choice of tests
then we prefer the one with the smallest β, that is, with the highest power (1 − β)). For a given significance
level, the bigger is the sample size, the higher will be the power of the test.
So, the statistical model is already specified. We have a random sample of n observations Y1 , Y2 , . . . , Yn
with Yi ∼ N (µ, σ 2 ). The objective is to test hypotheses about the unknown population mean.
Consider the problem of testing the simple null hypothesis that the population mean is equal to some
specified value µ0
H0 : µ = µ 0
against one of the following three alternative hypotheses
(i) H1 : µ > µ0 , (ii) H1 : µ < µ0 , (iii) H1 : µ 6= µ0 .
Test of the mean of a normal distribution: Population variance known

Assume first that population variance is known. For all three cases, when the null hypothesis is true we
have
Ȳ − µ0
Z= √ ∼ N (0, 1).
σ/ n
5
If H1 is true then in case (i) the r.v. Z will tend to be larger (for (ii) Z will tend to be smaller and for
(iii) the absolute value of Z will tend to be larger) than would be expected for a standard normal random
variable. Let us denote by zα the number for which
P r{Z > zα } = α
where Z ∼ N (0, 1). Then a test with significant level α (type I error) is obtained from the decision rule:
(i) For H1 : µ > µ0 ,
ȳ − µ0
Reject H0 if √ > zα
σ/ n
(ii) For H1 : µ < µ0 ,
ȳ − µ0
Reject H0 if √ < −zα
σ/ n
(iii) For H1 : µ 6= µ0 ,
ȳ − µ0
Reject H0 if √ > zα .
σ/ n 2
Test of the mean of a normal distribution: Population variance unknown

Suppose now that the population variance is no longer assumed known. If the sample size is not large,
the procedures discussed above are no longer appropriate.
To perform a testing procedure we replace σ 2 by its estimator, the sample variance s2 :
Ȳ − µ0
T = √ .
s/ n
Now, if the null hypothesis is true then the r.v. T follows a Student’s t distribution with (n − 1) degrees
of freedom (tn−1 ). Now we can use precisely the same arguments adopted above with the Student’s t
distribution now playing the same role as the standard normal distribution.
Let us denote by cα the number for which
P r{T > cα } = α where T ∼ tn−1
Then a test with significant level α (type I error) is obtained from the decision rule:
(i) For H1 : µ > µ0 ,
ȳ − µ0
Reject H0 if √ > cα
s/ n
(ii) For H1 : µ < µ0 ,
ȳ − µ0
Reject H0 if √ < −cα
s/ n
(iii) For H1 : µ 6= µ0 ,
ȳ − µ0
Reject H0 if √ > cα .
s/ n 2
Test of the mean of a normal distribution:

Large sample sizes
Suppose that we have a random sample of n observations from a population with mean µ and variance
σ 2 . If the sample size n is large (n ≥ 30), the test procedures developed for the case where the population
variance is known can be employed when it is unknown, replacing σ 2 by the observed sample variance s2 .
Moreover, these procedures remain approximately valid even if the population distribution is not normal.
P-value
6
The smallest significance level at which a null hypothesis can be rejected is called the probability value
or p-value of the test.
The p-value gives the probability of observing a value as extreme as the one we have, when the null
hypothesis is true. Suppose that the data produces the value Tobs of the test statistic T . Then we assume
that H0 is true and calculate the probability p of observing a value of T that is either as extreme as Tobs or
more extreme than Tobs in the direction of departure of H1 from H0 . For example, in the above procedures,
if we test H1 : µ = µ0 against H1 : µ > µ0 then the value of T more extreme than Tobs in the direction
of departure of H1 would be values such that T > Tobs . On the other hand, if H1 : µ 6= µ0 , then “more
extreme” would be either T > |Tobs | or T < −|Tobs |.
The decision rule with p-value: given the significance level α, we reject H0 if p < α.
Example 2 continued. Jars of coffee are labeled as 484 grams in weight. A random sample of ten jars
from a production line are opened and weighed accurately. Suppose the results Y1 , ..., Y10 gave
Y1 + · · · + Y10 = 4847.5, Y12 + · · · + Y10

2
= 2349850.
Test, at the 5% significance level the hypothesis that the jars are labeled correctly. It can be assumed that
weights are normally distributed with unknown standard deviation σ.
In this example, the obvious choices of null and alternative hypotheses are
H0 : µ = 484,
H1 : µ 6= 484.
Significance level 0.05 (that is 5%) is specified. The standard deviation σ √
is unknown, but from the infor-
mation provided we can estimate it by the sample standard deviation s = s2 (= 1.8 grams).
The test statistic is t-statistic
Ȳ − µ0
t= √ ∼ t9
s/ n
has t-distribution with 9 = 10 − 1 degrees of freedom. Recal from teh above that Ȳ = 484.75 and s2 = 3.24.
Therefore, we have got for the sample that
484.75 − 484
tobs = √ ≈ 1.318.
1.8/ 10
Decision with the critical values (acceptance/rejection regions). From the tables of t-distribution we find
that t9,0.025 = 2.262. So, the corresponding rejection region (or critical region) is (−∞, −2.262)∪(2.262, +∞).
Then, since −2.262 < 1.318 < 2.2262, we say that at 5% significance level, the data do not provide enough
evidence for rejection of the null hypothesis.
Decision with the p-value. H1 is two-sided, so that
p − value = 2(1 − P (t9 ≤ tobs ) = 2(1 − P (t9 ≤ 1.318))
Value P (t9 ≤ 1.318) is not available from Tables but can be estimated by the closes available values
P (t9 ≤ 1.3) ≤ P (t9 ≤ 1.318) ≤ P (t9 ≤ 1.4)
or
0.887 ≤ P (t9 ≤ 1.318) ≤ 0.9025
which gives p-value is not less than 0.195, and, hence, is not less than 0.05, so that we accept H0 .

1 Preliminaries: 1.1 Motivation

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

1 Preliminaries: 1.1 Motivation

Caricato da

Copyright:

Formati disponibili

1 Preliminaries

1.2 The model of measurements

where inequality becomes equality if and only if µ = Ȳ .

1.3 Short revision of 1st Year statistics

1.3.1 Point estimation

(which is in some sense a ”good approximation” to the unknown parameter θ).

Properties of Estimators. Let θ̂ = θ̂(Y1 , Y2 , · · · , Yn ) be an estimator of an unknown parameter θ. To

Note that if θ̂ is unbiased then it is also consistent if Var(θ̂) → 0.

1.3.2 Interval estimation

Y1 + · · · + Y10 = 4847.5, Y12 + · · · + Y10

Using the information provided we find

Therefore, the 95% CI for the population mean is

1.3.3 Hypotheses testing

(1) Specify the null and alternative hypotheses.

(3) Formulate some decision rule based on the statistic T .

NULL HYPOTHESIS NULL HYPOTHESIS

ACCEPT Correct decision Type II error

REJECT Type I error Correct decision

(i) H1 : µ > µ0 , (ii) H1 : µ < µ0 , (iii) H1 : µ 6= µ0 .

Test of the mean of a normal distribution: Population variance known

Test of the mean of a normal distribution: Population variance unknown

P r{T > cα } = α where T ∼ tn−1

Test of the mean of a normal distribution:

Y1 + · · · + Y10 = 4847.5, Y12 + · · · + Y10

p − value = 2(1 − P (t9 ≤ tobs ) = 2(1 − P (t9 ≤ 1.318))

P (t9 ≤ 1.3) ≤ P (t9 ≤ 1.318) ≤ P (t9 ≤ 1.4)

Potrebbero piacerti anche