Sei sulla pagina 1di 106

Estimation

Statistical Inference
Estimation
Hypothesis Testing

Statistical Inference: is the procedure
whereby inferences about a population are
made on the basis of the results obtained
from a sample drawn from that population
Confidence Interval for a Population Mean
x
f(x)
95 . 0 ) 1 ( = o
1
x
x
o 2

x
o 2 2
o
2
o
1
x
1
x
1
x
1
x
x
Confidence Interval for a Population
Mean
Example 5.2.1: Soppose a researcher, interested in
obtaining an estimate of the average level of some
enzyme in a certain human population, takes a sample of
10 individuals, determines the level of the enzyme in
each, and computes a sample mean of
Suppose, further it is known that the variable of interest is
approximately normally distributed with a variance of 45
We wish to estimate at 95 % confidence

22 = x
Confidence Interval for a Population
Mean
In general an interval estimate can be expressed as




When sampling is from a normal distribution with known
variance, an interval estimate of for may be expressed
as

) tan ( ) ( error dard s nt ycoefficie reliabilit estimator

( ) x
z x o
o 2 / 1

Confidence Interval for a Population


Mean
We are 100(1-o) percent confident that the single
computed interval







Contain the population mean, .

( ) x
z x o
o 2 / 1

Example 5.2.2
A physical therapist wished to estimate, with 99
percent confidence, the mean maximal strength
of a particular muscle in a certain group of
individuals.
He is willing to assume that strength scores are
approximately normally distributed with a
variance of 144.
A sample of 15 subjects who participated in the
experiment yielded a mean of 84.3
Example 5.2.2
For 99 % confidence, the confidence coefficient is
0.99
The z value for 0.99 is 2.58
You can find this value from Table C or MS Excell
as 2.58
This is our reliability coefficient


The standard error is 10 . 3
15
144
= =
x
o
Example 5.2.2
The 99 % confidence interval for
92.3 , 3 . 76
0 . 8 3 . 84
) 10 . 3 ( 58 . 2 3 . 84


So, we say that we are 99 percent confident that the population mean is
between 76.3 and 92.3 since, in repeated sampling, 99 percent of all intervals
that could be constructed in the manner just described would include the
population
Sampling from Nonnormal
Populations
Example 5.2.3
Punctuality of patients in keeping appointments is of interest
to a research team.
In a study of patient flow through theoffices of general
practitioners, it was found that a sample of 35 patients were
17.2 minutes late for appointments, on the average.
Previous research had shown the standard deviation to be
about 8 minutes.
The population distribution was felt to be nonnormal
What is the 90 percent confidence interval for , the true
average amount of time late for appointments?

Sampling from Nonnormal


Populations
Because the sample size is large (greather than
30) we can use central limit theorem and say that
we have approimately a normal distribution
From Table C or Excel, the reliability coefficient,
corresponding to a confidence coefficient of 0.90
is about 1.645
The standard error is
35 . 1
35
8
= =
x
o
Example 5.2.3
The 90 % confidence interval for
19.4 , 0 . 15
2 . 2 2 . 17
) 35 . 1 ( 645 . 1 2 . 17


So, we say that we are 90 percent confident that the population mean is
between 15.0 and 19.4 since, in repeated sampling, 90 percent of all intervals
that could be constructed in the manner just described would include the
population
Example 5.2.4
The activity values of a certain enzyme measured
in normal gastric tissue of 35 patients with gastric
carcinoma.
Let us assume that we know the population
variance as 0.36
Here we want to construct a 95 % confidence
interval for the population mean
Example 5.2.4
The activity values of a certain enzyme measured in
normal gastric tissue of 35 patients with gastric
carcinoma.
Let us assume that we know the population variance as
0.36
Here we want to construct a 95 % confidence interval for
the population mean
0.717971429 0.198776507
0.519194922 0.91674794
The t Distribution
The procedures we have seen in this chapter so
far requires the knowledge of variance of the
population from which the sample is drawn or
sopposed to be drawn
But the strange thing is that how we can know the
variance of population variance and not know the
population mean
In many cases we do not know both variance and
mean of population and this creates a problem for
constructing confidence interval

The t Distribution
Cases like this requires us to use sample variance
instead of population standard deviation as




And in this case the statistic is t distribution given
as
1
) (

) (
2 2

=

n
x x
s
n
x x
i i
o
n s
x
t
n
x
z
/

/

o

=

=
The t Distribution
The t distribution has the following properties
1. it has a mean of zero
2. it is symmetrical about the mean
3. In general, it has a variance grather than 1 but
the variance approaches to 1 as the sample size
gets larger
For df>2, the variance of the t distribution is

df/(df-2)
Alternatively, since in this case df = n 1 for n>3,
the variance of t distribution is
(n-1)/(n-3)
The t Distribution
4. The variable t ranges from - to +
5. The t distribution is really a family of
distributions since there is a different distribution
for each sample value of n-1
6. Compared to the normal distribution, the t
distribution is less peaked in the center and has
higher tails (look Figure 5.3.1)
7. The t distribution approaches the normal
distribution as n-1 aproaches infinity
The tables for t distribution are given in all statistic
books and one in your book at Table E
The t Distribution
The general procedure for constructing
confidence intervals is not affected by our having
to use t distribution rather than the standard
normal distribution
So we can write that

Estimator (reliability coefficient) x (standard error)

The only thing that is different is the source of reliability
coefficient
It is obtained from t table
The t Distribution
To be more specific
When sampling is from a normal distribution
whose standard deviation, is unknown,
the100(1-a) percent confidence intwerval for the
population mean, is given as
o

( )
n
s
t x
df , 2 / 1 o

Example 5.3.1
We wish to estimate the mean serum amylase
value in a healty population.
Determinations were made on a sample of 15
apparently healthy subjects.
The sample yielded a mean of 96 units / 100 mL and a
standard deviation of 35 units / 100 mL.
The population variance is unknown
Determine the 95 % confidence inter for the true mean
Example 5.3.1
Here we asume that rhe population have normal
distribution
Then
The standard error is


Then from the t-table or using excel for 95 %
confidence and df=n-1=14
the value for t is 2.1448
04 . 9
15
35
tan = = =
n
s
dardError s
Example 5.3.1
With all this now we can compute the confidence
interval as
115 , 77
19 96
) 04 . 9 ( 1448 . 2 96

Confidence Interval for The Difference


Between two Population Means
When the population variance are known, the
100(1-o) percent confidence interval for
2 1

( )
2
2
2
1
2
1
2 / 1 2 1
n n
z x x
o o
o
+

Example 5.4.1
A research team is interested in the difference between
serum uric acid levels in patients with and without
mongolism.
In a large hospital for the treatment of the mentally
retarded, a sample of 12 individuals with mongolism
yielded a mean of = 4.5 mg/100 mL
In a general hospital a sample of 15 normal individuals of
the same age and sex were found to have a mean value of
= 3.4 mg/100 mL
If it is reasonable to assume that the two populations of
values are normally distributed with variances equal to 1.0
Find the 95 % confidence interval for

1
x
2 1

2
x
Example 5.4.1
Por an estimate of we can use



The correspondin z value (reliability coefficient) for 95%
confidence can be found in table C as 1.96
The standard error is then,
39 . 0
15
1
12
1
2
2
2
1
2
1
2 1
= + = + =

n n
x x
o o
o
1
x
2 1

1 . 1 4 . 3 5 . 4
2 1
= = x x
Example 5.4.1
The 95% percent confidence interval is calculated as





Here we say that we are 95 % confident that the true
difference, is between 0.3 and 1.9
9 . 1 , 3 . 0
8 . 0 1 . 1
) 39 . 0 ( 96 . 1 1 . 1

2 1

Sampling from Nonnormal Population



If the sample size large you can use central limit theorem
to construct confidence interval for the difference
between two population means
Sampling from Nonnormal Population

Example 5.4.2: This example is about to compare the
economic status of patients treated in two hospitals.
The average family income of a sample of 75 patients
admitted to the hospital A ( ) was $68000, while the
average based on sample 0f 80 patients from hospital B was
( ) found to be $44500
If the population standard deviation for hospital A ( ) is
$6000 and for hospital B ( ) is $5000, find the 99%
confidence interval for , the true diffeence between
population means
2 1

2
o
1
x
1
o
2
x
Example 5.4.2







Then the 95% confidence interval
2.58 as (0.995) confidence 99% for
nt) coeffficie ty (reliabili value z fin the can we C, Table From
$23500 $44500 - $68000
is of estimate point The
2 1
2 1 2 1
= =

x x
x x
890
80
) 5000 (
75
) 6000 (
2 2
2
2
2
1
2
1
2 1
= + = + =

n n
x x
o o
o
$25796 , 21204 $
) 890 ( 58 . 2 23500 $
When to use t distribution for the
difference between means?
If the the variance is unknow and we want to find out the
difference between two population meas we can use t
distribution with te following assumtion
the two sample populations are normally distributed

If we acccept this then there will be two scenario
1. the population variances are equal
2. they are not equal
Population variances Equal
In this case we can use pooled estimate of the variance
as
( )
2
) 1 ( 1
2 1
2
2 2
2
1 1
2
+
+
=
n n
s n s n
s
p
Population variances Equal
The standard error of the estimat is then given as
2
2
1
2
n
s
n
s
StdError
p p
+ =
Population variances Equal
Then the 100(1 o) percent confidence interval is found
as






The degress of fredom to find the values of t from the
table is n
1
+n
2
2
( )
2
2
1
2
, 2 / 1 2 1
n
s
n
s
t x x
p p
df
+
o
Example 5.4.3
Here we will use example 5.3.1 again to illustrate this
Lets remember the example again
We wish to estimate the mean serum amylase value in a healty population.
Determinations were made on a sample of 15 apparently healthy subjects.
The sample yielded a mean of 96 units / 100 mL and a standard deviation of
35 units / 100 mL.
The population variance is unknown
Determine the 95 % confidence inter for the true mean
Suppose that in addition to the apparently healty subjects, serum
amylase determinations were also made on an independent sample of
22 hospitalized subjects
Suppose that the mean and standard deviation from this sample are
120 and 40 units/mL, respectively
Let also designate the 15 healty subjects as sample 2
So now our point estimate of true mean 1-2 is 120 96 =24
Construct the confidence interval for the difference between the mean
serum amylase value for the population of the apperantly healty
subjects and the mean for the hospitalized patients
Example 5.4.3
( )
( )
1450
2 22 15
40 ) 1 22 ( 35 1 15
2
) 1 ( 1
2 2
2
2 1
2
2 2
2
1 1
2
=
+
+
=
+
+
=
p
p
s
n n
s n s n
s
Example 5.4.3
( )
( )
50 , 2
26 24
22
1450
15
1450
0301 . 2 96 120
2
2
1
2
2 / 1 2 1


+
+

n
s
n
s
t x x
p p
o
Then the 95% confidence interval would be (using the
t value from table D or excell as 2.0301)
Population variances Not Equal
Here the quantity






Does not follow a t distribution with n1 + n2 2 degres of
freedom when the polation variances are not equal
( )
2
) 1 ( 1
2 1
2
2 2
2
1 1
2
+
+
=
n n
s n s n
s
p
Population variances Equal
In this case the reliability factor is calculated with the
following equation
freedom of degress 1 for t
freedom of degress 1 for t
and
,
where
'
2 2 / 1 2
1 2 / 1 1
2
2
2
2
1
2
1
1
2 1
2 2 1 1
2 / 1
- n t
- n t
n
s
w
n
s
w
w w
t w t w
t
o
o
o

=
=
= =
+
+
=
Population variances Equal
So, with this, an approximate 100(1-o) percent
confidence in ter val for 1 2 is obtained as
freedom of degress 1 for t
freedom of degress 1 for t
and
,
where
'
2 2 / 1 2
1 2 / 1 1
2
2
2
2
1
2
1
1
2 1
2 2 1 1
2 / 1
- n t
- n t
n
s
w
n
s
w
w w
t w t w
t
o
o
o

=
=
= =
+
+
=
Population variances Equal
In this case the reliability factor is calculated with the
following equation
( )
2
2
2
1
2
1
2 / 1 2 1
'
n
s
n
s
t x x +
o
Example 5.4.4
Total serum complement acivity (C
H50
) was assayed in 20
apparently helty subjects and 10 independent subjects with
disease.
Here are the results
Subjects n s
With
disease
10 62.6 33.8
Normal 20 47.2 10.1
x
Example 5.4.4
Wth the assumption of the sample populations are approximately
normally distributed
However we do not expect that the two population variance are
equal
Find the 95% confidence interval for 1 2

The point estimate for 1 2 = 62.6 47.2 = 15.4
The reliability coefficient is calculated after looking the values of t
1

and t
2
from Table E as
t
1
=2.2622
And
t
2
=2.0930

Example 5.4.4
The value of t is
freedom of degress 1 for t
freedom of degress 1 for t
and
,
where
255 . 2 '
20
) 1 . 10 (
10
) 8 . 33 (
0930 . 2
20
) 1 . 10 (
2622 . 2
10
) 8 . 33 (
'
'
2 2 / 1 2
1 2 / 1 1
2
2
2
2
1
2
1
1
2 / 1
2 2
2 2
2 / 1
2 1
2 2 1 1
2 / 1
- n t
- n t
n
s
w
n
s
w
t
t
w w
t w t w
t
o
o
o
o
o

=
=
= =
=
+
+
=
+
+
=
Example 5.4.4
The 95% confidence interval for 1 2 is
40.0 , 2 . 9
6 . 24 4 . 15
9245 . 10 255 . 2 4 . 15
20
) 1 . 10 (
10
) 8 . 33 (
255 . 2 4 . 15
2 2


+
Confidence Interval For A Population
Sometimes we are interested in population
proportions to answer qustions like
1. What proportion of patients who receive a particular
type of treatment recover?
2. What proportion of some population has a certin
disease?
3. What proportion of a population are immune to a
certain disease?
Confidence Interval For A Population
rror) (standarde t) coefficien ty (reliabili estimtor
To estimate the popultion proportion we proceed as
we did in population mean
A sample is drawn from population and sample
proportion ( ) is calculated
This sample proportion is used as the poinmt
estimator of the population proportion
A confidence interval is then calculated as
p

Confidence Interval For A Population


n p p
p
p
n p p
p
p
p
/ ) 1 (
estimate to use can then we
unknown is parameter the since
/ ) 1 (

=
=
o
o
o
From previous chapter we saw that when
np and n(1-p) are greather than 5, then the sampling
distribution of is quite close to the normal
distribution
If this the case, the reliability coefficient is some value
from the z table
The standard error is

p

Confidence Interval For A Population


n p p z p / )

1 (

) 2 / 1 (

o
The 100(1-a) percent confidence interval for is then
Calculated as

p

Example 5.5.1
A survey was conducted to study the dental healt
practices, and attitudes of a certain urban adult
population.
Of 300 adults interviwed, 123 said that they regularly
had a dental checkup twice a year.
We wish to construct a 95% confidence interval for the
proportion of subjects in the sample population who
regularly have a dental checkup twice a year
Example 5.5.1
p

The best point estimate of the population proportion is


= 123/300 = 0.41
The size of the sample and population is quite large
so that we can use standard normal distribution in
constructing the confidence interval
Then the reliability coefficient which is the z value at
95% confidence where z=1.96
028 . 0 300 / ) 41 . 0 1 ( 41 . 0 / )

1 (

= = = n p p
p
o
Example 5.5.1
0.46 , 36 . 0
05 . 0 41 . 0
028 . 0 96 . 1 41 . 0


The 95% confidence interval is than
Confidence Interval for the Difference
Between Two Population Proportions
2 1
p p
Sometimes we want to compare two groups with
respect to the proportions posessing some
charecteristic of interest
Then, a point estimate of difference in two population
proportions is provided by the difference in sample
proportions,
The standard error of the estimate must be estimated
by
2
2 2
1
1 1
2 1
)

1 (

1 (

n
p p
n
p p
p p

+

o
Confidence Interval for the Difference
Between Two Population Proportions
Then, A 100(1-a) percent confidence interval p
1
p
2
is
calculated as
2
2 2
1
1 1
) 2 / 1 ( 2 1
)

1 (

1 (

)

(
n
p p
n
p p
z p p

+


o
Example 5.6.1

Researchers wish to compare the effects of two treatments
on mean recovery time of patients with a certain disease
200 patients were randomly divided two equal groups
Of the first group, who received the standad treatment, 78
recovered within 3 days
Out of the other 100, who were treated by a new method, 90
recovered within 3 days
The physician wished to estimate the true difference in the
proportions in the two populations who would recover within
3 days

Example 5.6.1

The point estimate of the difference in the population
proportions is =0.78 0.90 = - 0.12

Then the 95%confidence interval por p
1
p
2

2 1

p p
02 . 0 , 22 . 0
10 . 0 12 . 0
100
) 90 . 0 1 ( 90 . 0
100
) 78 . 0 1 ( 78 . 0
96 . 1 ) 90 . 0 78 . 0 (
)

1 (

1 (

)

(
2
2 2
1
1 1
) 2 / 1 ( 2 1



n
p p
n
p p
z p p
o
Example 5.6.1

So, we say that we are 95% confident that the true
difference is between 0.02 and 0.22

02 . 0 , 22 . 0
10 . 0 12 . 0
100
) 90 . 0 1 ( 90 . 0
100
) 78 . 0 1 ( 78 . 0
96 . 1 ) 90 . 0 78 . 0 (
)

1 (

1 (

)

(
2
2 2
1
1 1
) 2 / 1 ( 2 1



n
p p
n
p p
z p p
o
Determination of Sample Size For
Estimating Means

The objectives in interval estimation are to obtain narrow
intervals with high reliability
If we look at the components of a confidence interval, we can
see that the width of the interval is determinec by the
magnitude of the quantity



Since the total width of the interval is twice this amount

error) (standard t) coefficien ty (reliabili
Determination of Sample Size For
Estimating Means

For a given standard error, increasing reliability
means a larger reliability coefficint
On the other hand, a larger reliability coefficients
result in a larger interval


error) (standard t) coefficien ty (reliabili
Determination of Sample Size For
Estimating Means

However, if we fix the reliability coefficient to a
value, then the only way to reduce the width of
the interval is o reduce the standard error
Since the standard error is equal to and
since o is constant, the only way to obtain a small
standard error is to take a large sample


error) (standard t) coefficien ty (reliabili
n / o
Determination of Sample Size For
Estimating Means

How large a sample must be?
This ofcourse on the size of , the population
standard deviation, and the desired degree of
reliability and desired interval width
Now, if we want an interval that extends d units
on either side of the estimator

error) (standard t) coefficien ty (reliabili d =
o
Determination of Sample Size For
Estimating Means

If sampling is to be with replacement, from an
infinite population, or from a population that is
sufficiently large to warrant our ignoring the finite
population correction
2
2 2
for it solve then we
d
z
n
n
n
z
d
o
o

=
Determination of Sample Size For
Estimating Means

When sampling is without replacement from a
small finite population, the finite population
correction is required as
2 2 2
2 2
) 1 (
for it solve then we
1
o
o
o
+

=


=
z N d
z N
n
n
N
n N
n
z
d
Determination of Sample Size For
Estimating Means

Now as you can see, these formulas for sample
size require a knowledge for
But the population variance is unknown and must
be estimated
The are several methods for estimation and some
of them are given here
2
o
Determination of Sample Size For
Estimating Means

1. A preliminay sample may be drawn from the population
and the variance computed from this sample may be used
as an estimate of
2. Estimate of may be available from previous or
similar studies
3. If we can asssume that population is approximately
normally distributed, then the range can be considered to
be equal to 6 standard deviation and can be
used
Note that here you need to know smallest and largest
value of the variable in the population
2
o
6 / R ~ o
Example 5.7.1

A health department nutritionist, wishing to conduct a
survey among a population of tenage girls to determine
their average daily protein intake, is looking the adivice of a
biostatistician relative to the size of sample that should be
taken
What procedure does the biostatistician follow in providing
assistance to the nutritionist?
Before the statistician can be of help to the nutritionist, the
latter must provide three items of information:,
The desired width of confidence interval
The level of confidence desired
Magnitude of the population variance
Example 5.7.1

Let say we want an interva about 10 units wide
This means , the estimate should be within about
5 units of the true value on either side
Secondly, we want a confifrnce coefficient of 0.95
Also from previous knowledge, the population
standard deviation is 20 grams
Now, with all this we have sufficient information to
caculate samle size
Example 5.7.1
Here


If we assume that we have sufficiently large
population then we can ignore the finete
population correction
Then the the number of sample is
62 is sample necessary the So
47 . 61
) 5 (
) 20 ( ) 96 . 1 (

2
2 2
2
2 2
=

=
n
n
d
z
n
o
5
20
96 . 1
=
=
=
d
z
o
Determination of Sample Size For
Estimating Proportions
We proceed as we did in estimating means
Here we make use of the fact that one-half the desired
interval,d, may be set equal to the product of the reliability
coefficient and the standard error
Assuming random sampling and conditions warranting
approximate normality of the distribution of leads the
following formula for n when sampling is with
replacement, when sampling is from an infinite population,
or when the sampled population is large enough to make
the use of the fine population correction unnecessary:
p q
d
pq z
n = = 1 here w
2
2
p

Determination of Sample Size For


Estimating Proportions
If the finete population correction can not be
disregarded, the proper formula for n is:




When N is large in comparison to n
(that is, n/N 0.05) the finete population
correction may be ignored
pq z N d
pq Nz
n
2 2
2
) 1 ( +
=
s
Determination of Sample Size For
Estimating Proportions
As can be seen, in both case we need to know p,
the proportion in the population posessing the
carecteristic of interest
However it the the value that we try to find out
and therefore it is NOT known
Therefore we must have some previous estimate
about it
Example 5.8.1
A survey is beeing planned to determine what
proportion of the families in a certain area are
medically indigent (yoksul)
It is belived that the proportion can not be
greather than 0.35
A 95% confidence interval is desired with d=0.05
What size sample of families should be selected?
Example 5.8.1
If the finete population correction can be ignored
The we can calculate the n as:
350 is sample necessary the so
6 . 349
) 05 . 0 (
65 . 0 35 . 0 ) 96 . 1 (
2
2
2
2
=

=
=
n
n
d
pq z
n
Confidence Interval for the Variance of a
Normally Distributed Population

Here we will look at two different cases
Point estimate of the population Variance
Interval Estimation of a Population Variance
Point estimate of the population
Variance

So far we have used sample variance as an estimator for
the population variance when it is uknown
But, how good this estmator is?
We made our judgements based on just one criterion
It was unbiasedness of estimator
Here we will look at whether the sample variance is an
unbiasedness estimator of population variance
To be unbiased, the average value of sample variance
over all possible samples must be equal to the population
variance
Point estimate of the population
Variance

That is, the expression E(s
2
)=o
2
must hod
To examine his statement, we will use the
example given in chapter 4 section 4.4 and Table
4.4.1
There, we have all possible samples size 2 from
the following popolation consisting the values 6,
8, 10, 12, 14.

Point estimate of the population
Variance

Remember that for that example




Now, if we compute the sample variances for
each of the possible samples shown in Table
4.4.1, we get the following Table
10
1
) (
and 8
) (
2
2
2
2
=

= =

=

N
x
s
N
x
i i

o
Point estimate of the population
Variance
Second Draw
6 8 10 12 14
First
Draw
6 0 2 8 18 32
8 2 0 2 8 18
10 8 2 0 2 8
12 18 8 2 0 2
14 32 18 8 2 0
Point estimate of the population
Variance

If sampling is with replacement, the expected
value of s2 is obtained by taking the mean of all
sample variances in above table
When we do this

N
s
E(s
n
i
8
25
200
25
0 2 8 ... 8 2 0
)
2
2
= =
+ + + + + +
= =

Point estimate of the population


Variance

Here we see, for example, that when sampling is
with replacement E(s
2
)=o
2
, where

) (
and
1
) (
2
2
2
2
N
x
n
x x
s
i i

=

=

o
Point estimate of the population
Variance
If we consider the case where sampling is without
replacement, the expected value of s
2
is obtained
by taking the mean of all variances above ( or
below) the principal diagonal
That is,


Here we see that the result is not equal to o
2
but
is equal to

C
s
E(s
n N
i
10
10
100
10
2 8 ... 8 2
)
2
2
= =
+ + + +
= =


N
x
s
i
1
) (
2
2


Point estimate of the population
Variance
These results are examples of general principles,
as it can be shown
E(s
2
)=o
2
when sampling with replacement
E(s
2
)=s
2
when sampling is without replacement

When N is large, N-1 and N will be approximately
equal and, consequently o
2
and s
2
will be
approximately equal
So we can see that s
2
is an unbiased estimator of
o
2
Interval Estimate of the Population
Variance
If we have point estimate, than it is possible to construct a
confidence interval for a population variance
The success here will depend on the our ability to
construct appropriate sampling distribution
Confidence intervals for o
2
are usually based on the
sampling distribution of (n-1)s
2
/o
2
.
If the samples of size n are drawn from a normally
distributed population, this quantity has a distribution
known as the chi square (X
2
) distribution with n-1
degrees of freedom
Interval Estimate of the Population
Variance
We will see details of chi square distribution latter
We will say here that it is the ditribution that the quantity
(n-1)s
2
/o
2
follows and that it is usuful in finding confidence
intervals for o
2
when the assumption that the population is
normally distributed hold true
Figure 5.9.1 on page shows some chi squared
distributions in your book
As you can see, as the sample size increases, chi squared
distribution approximates the normal distribution
Table F gives the X
2
values for different degrees of
freedom
Interval Estimate of the Population
Variance
To obtain a 100(1-o) percent confidence interval for o
2,
we
firt obtain 100(1-o) percent confidence interval for (n-1)s
2
/o
2
To do this, we select the values of X
2
from the Table F in
such a way that o/2 is to the left of the smaller value and o/2
is to the righ of the larger value
So, the two values of X
2
are selected in such a way that o is
divided equally between the two tais of the distribution
We may designate these two values of X
2
as
X
2
o/2
and X
2
1-o/2
, respectively


Interval Estimate of the Population
Variance
The 100(1-a) percent confidence interval for
(n-1)s
2
/o
2
is then given as




Now, we can work on this expression in a way
that we obtain an expression with o
2
alone as the
middle term


X
s n
X
2
2 / 1
2
2
2
2 /
) 1 (
o o
o

<

<
Interval Estimate of the Population
Variance
First, lets us divide each term by (n-1)s
2
to get




If we take the reciprocal of this expression we
have



s n
X

s n
X

2
2
2 / 1
2 2
2
2 /
) 1 (
1
) 1 (
< <

o o
o

X
s n

X
s n
2
2 / 1
2
2
2
2 /
2
) 1 ( ) 1 (
o o
o

> >

Interval Estimate of the Population


Variance
Note that the direction of the inequalities changed
when we took the reciprocals
If we reverse the order of the terms we have



This is the 100(1-o) percent confidence interval
for o
2
2
2 /
2
2
2
2 / 1
2
) 1 ( ) 1 (
o o
o
X
s n

X
s n

< <

Interval Estimate of the Population


Variance
If we take the square root of each term in above
expression we have the following 100(1-o)
percent confidence interval for o, the population
standard deviation:


2
2 /
2
2
2 / 1
2
) 1 ( ) 1 (
o o
o
X
s n

X
s n

< <

Example 5.9.1
Now, lets go back to the example 5.3.1 and
assume that the population of serum amylase
determinations from which the sample of size 15
was drawn is normally distributed
We wish to construct the 95% confience interval
for

2
o
Example 5.9.1
Over there the sample yielded a value of s
2
=1225
The degrees of freedom are n-1=14
The appropriate value of X
2
from Table F are
629 . 5
119 . 26
2
2 /
2
2 / 1
=
=

o
o
X
and
X
Example 5.9.1
The 95% confidence interval for o
2

20 . 55 62 . 25
is for interval confidence 95% then
7223 . 3046 6101 . 656
629 . 5
1225 ) 1 15 (
119 . 26
1225 ) 1 15 (
) 1 ( ) 1 (
2
2
2
2 /
2
2
2
2 / 1
2
< <
< <

< <

< <

o
o
o
o
o
o o

X
s n
X
s n

Confidence Interval for the Ratio of the Variances
of Two Normally Distributed Populations
Here we would like to compare two variances
One way to do this is to form their tatio as


If the two variances are equal, then the ration will
be 1
However we do not know the population variance
and therefore comparison will be based on
sample variances
2
2
2
1
o
o
Confidence Interval for the Ratio of the Variances
of Two Normally Distributed Populations
So, we need to rely on some sampling distribution
and this time the distribution of



s utilized provided certain assumtions are met
The assumtions are that and are computed
from independent samples of size n
1
and n
2
,
rspectively, drawn from two normally distributed
populations
2
2
s
2
2
2
2
2
1
2
1
/
/
o
o
s
s

2
1
s
Confidence Interval for the Ratio of the Variances
of Two Normally Distributed Populations
If the assumptions are met, the expression



follows a distribution known as the F distribution
The detal about the F distribution will be given in latter
chapters
Here, this distribution depends on two degrees of freedom
values
One corresponding to the n
1
-1 used in computing and
one n
2
-1 used for
2
2
s
2
2
2
2
2
1
2
1
/
/
o
o
s
s

2
1
s
Confidence Interval for the Ratio of the Variances
of Two Normally Distributed Populations
Figure 5.10.1 shows some F distribution for
several numerator and denominator degrees of
freedom combinations
Table G contains, for specified combinations of
degrees of freedom and values of o, F values to
the right of which lies o/2 of the area under the
curve of F
Confidence Interval for the Ratio of the Variances
of Two Normally Distributed Populations
To find the 100(1-o) percent confidence interval for


We begin with the expression



Where and are the values from the F table to the
left and right of which, respectively, lies of the area
under the curve
2
2
2
1
/ o o
2 / 1
2
2
2
2
2
1
2
1
2 /
/
/
o o
o
o

< < F
s
s
F
2 / 1 o
F
2 / o
F
2 / o
Confidence Interval for the Ratio of the Variances
of Two Normally Distributed Populations
The middle term of this expression may be
rewritten so that the entire expresion is



f we divide through by we have
2
2
2
1
/ s s
2 / 1
2
1
2
2
2
2
2
1
2 / o o
o
o

< < F
s
s
F
2
2
2
1
2 / 1
2
1
2
2
2
2
2
1
2 /
/ / s s
F
s s
F

o o
o
o

< <
Confidence Interval for the Ratio of the Variances
of Two Normally Distributed Populations




If we take the reciprocal of the expression
2
2
2
1
2 / 1
2
1
2
2
2
2
2
1
2 /
/ / s s
F
s s
F

o o
o
o

< <
2 / 1
2
2
2
1
2
2
2
1
2 /
2
2
2
1
/ /
o o
o
o

> >
F
s s
F
s s

Confidence Interval for the Ratio of the Variances
of Two Normally Distributed Populations
Now, If we reverse the order we have the following
100(1-o) percent confidence interval for
2 /
2
2
2
1
2
2
2
1
2 / 1
2
2
2
1
/ /
o o
o
o
F
s s
F
s s
< <

2
2
2
1
/ o o
Example 5.10.1
Researchers selected a simple random sample of size 21
from a population of apparently healty adult subjects
(sample 1)
They selected an independent simple random sample of
size 16 from population of patients with Parkinsons
disease (sample 2)
The variable of interest was reaction time to a particular
stimulus
The sample variances were 1600 for sample 1 and 1225
for sample 2
Here we wish to construct the 95 percent confidence
interval for
2
2
2
1
/ o o
Example 5.10.1
Here we have the following information
2 /
2
2
2
1
2
2
2
1
2 / 1
2
2
2
1
/ /
o o
o
o
F
s s
F
s s
< <

76 . 2
389 . 0
0.05
15 freedom of degrees r denominato
20 freedom of degrees numerator
1225 1600
16 21
975 . 0
025 . 0
2
2
2
1
2 1
=
=
=
=
=
= =
= =
F
F
s s
n n
o
Example 5.10.1
Now, as you can see we finf the value 2.76 from
the Table G
But how did we find the value of 0.389?
It comes from the equation
1 , 2 , 2 /
2 , 1 , 2 / 1
1
df df
df df
F
F
o
o
=

Example 5.10.1
By rearranging the equation
389 . 0
57 . 2
1
1
1
15 , 20 , 2 / 05 . 0
20 , 15 , 2 / 05 . 0 1
15 , 20 , 2 / 05 . 0
1 , 2 , 2 / 1
2 , 1 , 2 /
= =
=
=

F
F
F
F
F
df df
df df
o
o
Example 5.10.1
Wit this, the lower confidence limit (LCL) and
upper confidence (UCL) for are as follows
2 , 1 , 2 / 1
2
2
2
1
2 , 1 , 2 /
2
2
2
1
/ 1
1
1
df df
df df
F s
s
UCL
F s
s
LCL
o
o

=
=
2
2
2
1
/ o o
Example 5.10.1
Now we can calculate 95% confidence interval for
as follows
36 . 3 473 . 0
389 . 0
1225 / 1600
76 . 2
1225 / 1600
2
2
2
1
2
2
2
1
< <
< <
o
o
o
o
2
2
2
1
/ o o

Potrebbero piacerti anche