Sei sulla pagina 1di 22

2

Estimation and Hypothesis Testing

2.1 Introduction
We look at the classical frequen-
We wish to obtain the mean and variance of data, with con dence in our conclusions. tist hypothesis testing approach in
We have assumed so far that the distribution is known: given parameters we deduce this chapter, and brie y develop
something about data. A more common scenario involves being given data: we must Bayesian approaches in the next.
now induce the distribution: we check a variety of di erent models which attempt to
explain how data would look given speci c values for the parameters involved. Sta-
tistical inference refers to estimation of population parameters given a small sample,
usually for the purpose of hypothesis testing (i.e. is a parameter within speci ed
bounds? If so, does it con rm/reject some hypotheis?). Estimation is about identi- Asking for  and  = getting point
fying how the system (model) behaves in untested situations. Knowing the essentials estimates . We could also ask for
of statistical inference, we can handle ANOVA, regression, contingency tables etc. interval estimates: x some multi-
Example: Given a set of marks, do they follow N (;  2 )? What are  and  ? ple of  .

Population vs. sample Objective: given the distribution of birth weights, nd


N (; 2 ). Ideally, these parameters must be based on the entire population. Instead,
given practical limitations, we can select a random sample of n babies as representa-
tive of the entire population.
Random sample: choose some members of a population with each member chosen
independently and having nonzero probability of being chosen. Simple random sam-
ple: each member of a random sample has same probability of being chosen. Cluster
sampling: If the entire population cannot be uniformly sampled, divide it into clus-
ters and sample some clusters. It is very dicult to verify that a sample is really
a random sample. The population is assumed to be (e ectively) in nite. Sampling
theory must be used if the population is actually nite.

Selection of random samples Usually computer generated. A random number


(digit) = random variable which takes a value from 0 to 9 with equal probability:
P (X = 0) = P (X = 1) = ... = P (X = 9) = 1/10 (i.e. a uniform distribution).
Computer generated random numbers: each digit is equally likely to occur. The
value of one digit is independent of any other digit.
Clinical trials for evaluating drug candidates require speci c randomization de-
signs to avoid bias:
 Block randomization : one block gets treatment and another the placebo.
 Blind trials: Patients do not know what they get.
 Double blind trials: both doctors and patients do not know what they get.

2.1.1 Estimation of the mean


Point estimation of the mean

What is the sampling distribution of the mean? Given X1 , X2 , ... Xn , estimate  = E [X ] is the population
P mean
E [X ] =  and 2 .
There may have been many sets of samples of size n, each would and X = (1=n) ni=1 Xi is the
sample mean.
19
20 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

have resulted in its own X (X1 ; X2 etc). Obviously X is itself a random variable and
has its own distribution. Therefore X = (X1 + X2 + ::: + Xn )=n
n n
1X
E [Xi ] = n1
  X
E X =
n i=1
= (2.1)
i=1
Hence the expectation of the sample mean = population mean of each observation.
This is convenient: even though each measurement ought to be at or near the true  
underlying value, by measuring many times and averaging, we get a better way to E [X ] =  E X =
estimate .

Estimators Example: Let X = # of heads in 100 tosses  of


 a fair coin. Then we
expect E [X ] = . From the above discussion, even E hX i = . In general, an
estimator X^ is said to be an unbiased estimator of  if E X^ = . Other unbiased X is an unbiased estimator of .
estimators of the population mean include the sample median, and the Manhattan
norm (= average of smallest and largest values). Obviously there could be many
possible estimators. The sample mean is the preferred estimator, given a choice of X is the estimator of  with the
estimators, if the population distribution is normal. smallest variance.

Standard error of the mean X is claimed to be an unbiased estimator of  for


any sample size n. Intuitively however, we would prefer large n (sample more, gain
accuracy). To understand this, note that
 Pn  n n
i=1 Xi = 1 1 X 1 2
  X
Var X = Var Var [Xi ] = 2 2 = 2 (n2 ) = (2.2)
n n2 i=1 n i=1 n n
Terminology: Standard deviation refers to a directly measured variable. Standard Var [X ]=  2 6= f(sample size). But
error refers to a derived quantity such as the mean. Hence the standard error of the Var X =  2 =n = f(sample size).
sample mean, givenp random samples of size n (X1 ... Xn ) with population mean  and
variance  2 is = n. The standard error of the mean = standard deviation of X and As n ! 1, Var X
 
! 0, which
is 6= standard deviation of an individual observation ( ). The larger the sample size implies that X ! .
(n), the better the estimate of . The standard error also depends on  . Remember,
 6= f(sample size). In fact, from our discussion on errors,  is a f(experimental error
etc.).
2 is usually unknown. Hence an estimator
p for
p 2 is the sample variance S 2 .
Hence the standard error of the mean is = n ' s= n. X is one of few measures for
which standard error may be calculated. Hence statistics heavily relies on the mean The trimmed mean is calculated
even though quite often the median, mode, and the trimmed mean often perform after throwing out outliers accord-
better with respect to outliers. The standard error of the median or interquartile ing to some criteria.
range is unknown (unless extensive sampling is done).

2.1.2 Point estimation of the sample variance


The expected value of the sample variance may be computed as follows: First note
that the variance of X is  2 =n. Then, using
(Xi X )2 = (Xi  +  X )2 = (Xi )2 + ( X )2 + 2(Xi )( X )
n
X n
X n
X n
X
(Xi X )2 = (Xi )2 + ( X )2 + 2 (Xi )( X )
i=1 i=1 i=1 i=1
The middle term on the right is n(X )2 . The last term is
n
X n
X
2 (Xi )( X ) = 2(X ) (Xi ) = 2n(X )2
i=1 i=1
and hence n n
X X
(Xi X )2 = (Xi )2 n(X )2
i=1 i=1
2.1: Introduction CL688
c SBN, 2009 21

Dividing by n and then looking for the expectation of each term,


 Pn   Pn 
=1 (Xi X )2 =1 (Xi )2  
E i
n
=E i E (X )2
n
But
Pn
i=1 (Xi )2
= Var [X ] =  2 and

E 2

= 2
n
E (X )2 = Var X = n
    2

Hence  Pn   
i=1 (Xi X )2 2 (n 1)2 E X =
E n
= 2
n
=
n   n 1 2
If were to be de ned for one sampling attempt as S 2 (Xi ) = E S 2 (Xi ) = 
Pnthe sample variance n

i=1 (Xi X ) =n, then on many such sampling attempts, the average (expected)
2

sample variance would be where S 2 is de ned on the basis of


n.
E S 2 (Xi ) = n 1 2 6= 2
 
n
This implies that to get an unbiased estimate of  2 , S 2 (X ) de ned using n 1 in the de-
 Pn  nominator is an unbiased estimator
=1 (Xi X )2
E i
n 1
= 2 (2.3) of  2 .

Estimators: Of all the possible estimators of , we have used X = minimum variance Can you see this from the above
estimator of . Hence if X is used, we are on average getting a sample variance derivation?
slightly lower than the population variance and hence the n=(n 1) correction factor
is required. Therefore to get an unbiased estimate of  2 , a preferable form of the
sample variance is Pn
(X X )2
S (X ) = i=1 i
2
n 1
This denominator (n 1) is sometimes referred to as the degrees of freedom. In this
case the n sampling attempts contribute n degrees of freedom, with one being used
to estimate the mean.

Sampling distributions
Given that we expect to have limited data, parameter estimation may be assumed to
fall into one of three scenarios:
1. Estimation of the mean with  known.
2. Estimation of the mean with  unknown.
3. Estimation of  .
These situations are discussed in the next three sections.

2.1.3 Estimation of the mean when  is known


Take a random sample of size n from apopulation having mean  and variance  2 .
Then X is a random variable with E X = . For samples with in nite populations,
    2
Var X = E (X )2 =
n
For samples with a nite population of size N , a nite population correction factor Finite population correction fac-
is used, and tor.
  2 N n
Var X = (2.4)
nN 1
p
When we consider the distributions of X (instead of Xi ) and = n (instead of  ),
then we are considering sampling distributions.
22 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

Chebyshev’s theorem again: Chebyshev's theorem for a random variable was


1 1
P (jXi j > k)  ) P (jXi j < k)  1
k 2 k2
For a sampling distribution,
 
k 1
P jX j < p  1 (2.5)
n k2
p
Let k= n = " = some tolerance. Then (1=k2 ) =  2 =(n"2 ) and therefore For a given tolerance as n ", then
X ! . This is the law of large
2 numbers.
P (jX j < ")  1 (2.6)
n"2
Central Limit Theorem Let Z = standardized sample mean.
X p 
Z= (2.7)
= n
Then the Central Limit Theorem implies that Z is a random variable whose distribu-
tion function is  N (0; 1) as n ! 1. Regardless of the population distribution, as n is
", the sampling distribution of X approaches a Normal distribution. If the underlying
distribution is normal, then X itself follows a normal distribution: X  N (;  2 =n).
Let a population have mean  and variance  2 . Let random samples = X1 ... Xn .
Then the Central Limit Theorem implies that for large n, X  N (;  2 =n) even if This is a profound result. Estimat-
the underlying distribution is not normal. Thus, even if an individual measurement ing  for some random variable X
does not follow a Normal distribution, the measured sample mean does. may be dicult given the possibil-
ity of error (large  ) or given that
Interval estimation given µ and σ Given Xi with population mean  and vari- the underlying distribution might
ance  2 , then X  N (;  2 =n). If  and  are known, how a set of samples would be unusual. The way out is to keep
behave is precisely
p known. For p example, 95% of all sample
p means should be between sampling.
 1:96= n and  +1:96= n. If Z = (X )== n, then Z is a standard Normal
distribution and then 95% of Z values (those values between 2.5 and 97.5%) given a
sample size n would fall between -1.96 and 1.96. In general, the probability is 1 Z =2 takes on values such as
that Z0:025 = 1:96, Z0:005 = 2:575.
Z1
X 
 =pn  Z1 jX j
) p  Z1
=2 =2 =2
= n
This can bepunderstood as: the maximum error of the estimate = jX j and is N(0,, 1)
< Z1 =2 = n with probability 1 .
An estimate of sample size needed if we wish to keep the estimate error at a
Density

con dence level (speci ed by ) to some tolerance level (= jX j) is


 2
Z1 =2 
n= (2.8) α 2 α 2
jX j Zα 0 Z1−−α
2 2

The interval is (with 100(1 ) probability)


 
X Z1 =2 p <  < X + Z1 =2 p (2.9) Figure 2.1. The Z test.
n n

2.1.4 Estimation of the mean when  is unknown


When  is unknown, the sampling distribution for a normalized version of x ispno
longer z = N (0; 1).  is unknown, and S is used as an estimator of  . (X )=S= n This insight is due to William Gos-
does not follow the standard normal distribution, but is instead a function of sample sett, who went by the title `student
size, and therefore represents a family of distributions. Hence n would have to be of statistics', and hence it is also
speci ed to get the appropriate distribution. called the Student's t distribution.
If X1 :::Xn are values of a random variable  N (;  2 ) and are independent, then
the t-statistic calculated from your data is
X p 
t= (2.10)
S= n
2.1: Introduction CL688
c SBN, 2009 23

t follows a t distribution with n 1 degrees of freedom. The 100  uth percentile d td;0:975 Z0:975
of a t distribution with d degrees of freedom is denoted td;u and P (td < td;u )  u.
Thus, t20;0:95 = 95% percentile (upper 5%) of a t distribution with 20 degrees of 4
2.776 1.960
freedom. The t distribution is a symmetric distribution and the di erence from a 9
2.262 1.960
N (0; 1) distribution is greatest for n < 30. Hence when  is unknown, use the t 29
2.045 1.960
statistic instead of the Z statistic. This t statistic should follow a t distribution with 60
2.000 1.960
df = n 1. E [t] = 0, Var [t] > 1 and ! 1 as n ! 1. That is, as n ! 1, t ! N (0; 1). 1 1.960 1.960
For n > 30, N (0; 1) may be used
to approximate the t distribution.
Interval estimate: For a given n, 100%(1 ) of the t statistics should fall between

P (tn 1 ; =2 < t < tn ; =2 ) = 1


11
tn−−1
or
X p  X p 
tn < <t

Density
; =2 and
1
S= n S= n n ; =2
11

This can be rearranged to get α 2 α 2

S S
X tn ; =2 p <  < X tn ; =2 p
tn−−1 α 2 0 tn−−1 1−−α 2
11
n 1
n
But a t distribution is symmetric and hence tn 1; =2 = tn 1;1 =2 and Figure 2.2. The t test.

S S
X tn ; =2 p <  < X + tn ; =2 p
11
n 11
n
For n < 30, the 100%  (1 )th
and hence con dence interval of  =
  p
S S X  tn ; =2 S= n
P X tn =2 p <  < X
 + tn =2 p =1 (2.11) 11
; ;
11
n 11
n
For n > 30, tn 1;1 =2  Z1 =2 .
Therefore the 100%(1 )th con dence interval for  is X  tn 1;1 =2
p
S= n. This
implies that (1 )100 (e.g. 95%) of con dence intervals constructed from sample
sizes n will have  within these bounds. For large n (at least > 30, sometimes
n > 200), approximate the t distribution with the Normalpdistribution. Hence the
100%  (1 )th con dence interval of  = X  Z1 =2 S= p n.
The width of a con dence interval = 2tn 1;1 =2 S= n = f (n; S; ). As n in-
creases, CI width decreases. As 1 increases, CI width increases and therefore in
increasing 1 from 0.95 to 0.99, a larger CI will be needed. The CI width also
increases with increasing S . S can be decreased by replacing one measurement with
the mean of replicates for example.

2.1.5 Estimation of the Variance


P  
If individual observations are  N (;  2 ), then S 2 = (Xi X )2 =(n 1)Pand E S 2 =
2 . Hence S 2 is an unbiased estimator of 2 . As explained before, using (Xi X )2 =n
overestimates  2 by (n 1)=n.

χ2 distribution: The 2 distribution is the sample distribution of S 2 and is used


to compute point and interval estimates for  2 . It is assumed that the underlying
distribution of the measurements made (X1 :::Xn ) is Normal (N (;  2 )). IfP the mea- The same assumption was made
surements Xi :::Xn are independent and are each  N (0; 1) then if G = ni=1 Xi2 , for the mean, in developing the t
then G follows a 2 distribution with n degrees of freedom which is denoted 2n . distribution.
Once again, like the t distribution, this is a family of distributions based on n.
The t distribution was symmetric about 0 for any n. The 2 distribution is always
> 0 and is skewed to the right and is therefore not symmetric. This skew decreases Learn how to read 2 tables and
as n increases. The expected value of a 2n distribution is n and the variance is 2n. compute percentiles.
The uth percentile of a 2n distribution is P (2n < 2n;u )  u.
24 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

Interval estimation of population variance σ 2 : To get an interval estimate,


we need the sampling distribution of S 2 . Let X1 :::Xn be  N (;  2 ). Then Z =
(X )=  N (0; 1). Therefore
P
i (Xi )2
X
Zi =
2
2
 2n (2.12)
i
P
i.e. i Zi2 follows a 2 distribution with degrees of freedom (df ) = n. However,
 is unknown and is itself estimated by X . In substituting  by X , one df is lost. Compare (n 1)S 2 = 2 with 2n 1 .
Therefore Pn
 2
i=1 (Xi X ) 2
2
 n 1

But
X (Xi X )2 X
S2 =
(n 1)
) (n 1)S 2 = (Xi X )2  2 2n 1
i
and hence
2
S2  2n (2.13)
n 1 1

S 2 therefore follows a 2n 1 distribution with df = n 1, multiplied by  2 =(n 1).


For interval estimation, The 2 distribution assumes that
! the underlying X is normally dis-
2 2n 1; =2 2 2n 1;1 =2 tributed. If they are not normally
P <S <
2
=1 (2.14) distributed, the con dence interval
n 1 n 1
may not be 1 at all, even at
which gives large n. This is unlike the con -
(n 1)S 2 (n 1)S 2 dence interval for , which works
< 2 < for large n, even if Xi are not nor-
n
2
; =2
11
2n 1; =2 mal.
and this implies
!
(n 1)S 2 (n 1)S 2
P < < 2
=1 (2.15)
2n ; =2
11
2n 1 ; =2

The 100%  (1 ) con dence interval for  2 is


" #
(n 1)S 2 (n 1)S 2
; (2.16)
2n 11 ; =2 2n ; =2
1

2.1.6 Estimation for the Binomial distribution


We wish to estimate the value of p of a binomial distribution. Let the prevalence of
the disease be p. Let Xi = Bernoulli variable = 0 (healthy) or 1(sick) for i = 1 : : : n. Recall that for a binomial distri-
How does one estimate
P p? The number of people with the disease, among the n bution, E [X ] = np and Var [X ] =
people = X = i Xi . Let p^ = sample
P population of events = ratio of people with np(1 p) = npq.
disease in the sample. Then p^ = i Xi =n = X=n = sample mean. Hence
For a sampled proportion,
 
E [^p] = E Xn = E [nX ] = np
n
=p (2.17) E [^p] = p; Var [^p] = pq=n

Similarly,  
X Var [X ] np(1 p) p(1 p)
Var [^p] = Var = = = (2.18)
n n2 n2 n
p
The standard error of p^ is pq=n and hence p^ is an unbiased
p
estimator of the popu-
lation parameter p for any pn. The standard error is pq=n and is estimated (given
that we do not know p) as p^q^=n.
2.2: Hypothesis testing CL688
c SBN, 2009 25

Normal theory method Assuming that a point estimate of p (= p^) is available, The number of successes in n
what is the interval estimate? IfPthe Normal distribution may be used to approximate Bernoulli trials is X = np^ which
the binomial then p^ = X=n = i Xi =n is normally distributed with mean = p and implies that X is a Binomial ran-
variance is pq=n. For n Bernoulli trials, each with a mean p and variance p(1 p), dom variable with parameters n
then for large n, CLT indicates that p^ = X = normally distributed with mean  = p and p and hence np^ = X 
and variance  2 =n = pq=n, or p^  N (p; pq=n). N (np; npq).
The normal approximation to the Binomial distribution was assumed to be valid
if npq  5. However, here p (and q ) are unknown. Therefore we estimate p by p^ and
q by 1 p^. To evaluate the 100%  (1 ) con dence interval for p,
 r r 
pq pq
P p Z1 =2 < p^ < p + Z1 =2 =1
n n
r r
pq pq
p Z1 =2 < p^ and p^ < p + Z1 =2
n n
both of which are quadratics in p. Rather than solve these, we can use pq=n  p^q^=n,
and rearrange the inequalities to give Use the normal approximation to
r r ! the binomial if np^q^  5.
p^q^ p^q^
P p^ Z1 =2 < p < p^ + Z1 =2 =1 (2.19)
n n
and hence the 100%  (1 ) con dence interval for p is
" r r #
p^q^ p^q^
p^ Z1 =2 ; p^ + Z1 =2
n n
The maximum error of the estimate is
r
jp^ pj = Z1 =2 p(1n p)
The sample size to achieve 100%  (1 ) con dence interval is For p = q = 1=2,
 2  2
Z1 =2 1 Z1 =2
n = p(1 p) (2.20) n=
jX j 4 jX j
Exact method of interval estimation A 100%  (1 ) con dence interval for p
is given by [p1 ; p2 ] where
n
X
P (X  xjp = p1 ) = = n C pk (1
k 1 p1 )n k
2 i=1
n
X
P (X  xjp = p2 ) = = n C pk (1
k 2 p2 )n k (2.21)
2 i=1
The problem in using this is in computing the summations on the right.

2.2 Hypothesis testing


The goal of statistical inference is to decide on some hypothesis. The hypothesis
could be about the value of some parameter (or parameters) or even a qualitative
statement (e.g. this is a second order reaction). Usually the parameters are yet
to be observed/induced; in some cases they may even be unobservable. Testing a
hypothesis is done using point and interval estimation. The classical method involves
assuming that X follows a particular distribution, and identifying appropriate central
and interval estimates. Computer intensive methods which do not assume a speci c
distribution are now routinely employed: these attempt to generate distributions (and
hence desired quantiles) before employing a hypothesis test. The testing procedure
in both cases is identical: we specify a null and an alternative hypothesis and then
contrast them. The null hypothesis is denoted H0 and is usually to be disproved. H0 : Innocent until proven guilty.
One sample inferencing: hypotheses about a single distribution are evaluated.
Two sample inferencing: Two di erent distributions are compared. In general, multi-
sample inferencing may be called for.
26 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

2.2.1 One sample inference about the mean


Example: Nationwide average score in an IQ test = 120. In a poor area, 100 student
scores are analyzed and mean = X = 115, sd(X ) = 24. Is the area average lower than
the national average?
Soln: Assume the 100 scores follow a normal distribution with unknown population
mean . Then construct a 95% lower one sided con dence interval for  based on
the sample data (an interval of the form  < C ). If C  120, then the area scores
are similar to national scores. Else, if C < 120, the area score is not similar to the
national score.
H0 = null hypothesis = hypothesis to be tested =  (area scores) = 0 (national
scores). H1 = alternate hypothesis = contradiction of H0 =  (area scores) < 0
(national scores). Under each hypothesis, we are assuming a normal distribution.
Our focus could be on  or 0 with respect to the outcomes, but convention is to The notation we use is
focus on H0 and decide whether it is true or not true. There are 4 possible outcomes:
H0 :  = 0 vs. H1 :  < 0
1 Accept H0 as true, and in fact it is true.
2 Accept H0 as true, in fact it is not true.
3 Reject H0 as true, in fact H0 is true.
4 Reject H 0 as true, in fact H0 is not true/H1 is true.
Truth
H0 H1
Decision H0 true, H0 accepted: case 1 H1 true, H0 accepted: case 2
Taken H0 true, H1 accepted: case 3 H1 true and H1 accepted: case 4 H0 H1

In practice, we usually cannot prove that H0 is true. We can accept H0 by failing


to reject H0 (i.e. it is usually easier to disprove than prove). In cases 1 and 4, the
correct decision is made. Case 3 (H0 was true but was rejected) is referred to as a y2
Type I error. Case 2 (H1 was true, but H0 was accepted) refers to a Type II error. α 2 β α 2
Example: Type I error: concluding that (area) was < (national) when in reality
µ0 µ1
(area) = (national). Type II error: concluding that (area) = (national) when
in fact (area) < (national).
Example: Diagnostic kits: False positives are type II errors and false negatives are
type I errors. Figure 2.3. Type I and II errors in a
hypothesis test.
Definitions:
 The probability of a type I error = = signi cance level of the test applied.
 The probability of a type II error = .
 The Power of a test = 1 = 1 P (type II error) = 1 - P (accepting H0 jH1
true) = P (rejecting H0 jH1 true).
The strategy of hypothesis testing: When testing a hypothesis, statistical tests
must be used such that and are made as small as possible. However, making For xed , we want a test of max-
small implies that we reject H0 less often. Making small implies that we accept H0 imum power.
less often. This sets up a contradiction, and usually, as increases, decreases and
vice versa. We can x at some speci c level (0.01, 0.05, 0.1 etc) and use a test that
minimizes (or maximizes the power).
Let H0 imply that the random variable being evaluated follows a distribution with
mean 0 and let H1 imply a distribution with mean 1 . Let  = 0 1 . If is xed, H0 H1
as  is decreased, is increased and hence the power of the test = 1 is decreased.
y2

It is more dicult to discriminate between the two means. If  is xed, the only way
to increase 1- is to increase (i.e. move the critical value to the left, in the plot).
α 2 α 2
The only way to decrease is to decrease 1 . In the extreme, type I errors may β
be avoided, by always rejecting H1 . µ0 µ1
There is a practical problem: If 0 and 1 are xed (1 may be unknown, but is
assumed constant, and so is considered xed) then  is xed. Then when is chosen,
the power may be small. The solution to this is to sample more: if the variances of the
Figure 2.4. Increasing the sample size
results in more power.
2.2: Hypothesis testing CL688
c SBN, 2009 27

are decreased (by increasing the sample size, sd(X ) = = n


sampling distributions p
p
and is estimated by s= n), then both type I and type II errors may be decreased.
This is done by increasing the sample size.

2.2.2 One sample test for : lower one sided test


The random variable X is assumed to be N (;  2 ). For the example of the IQ scores,
H0 :  = 0 = 120 vs. H1 :  < 0 . For example, a speci c alternative hypothesis
could be H1 :  = 1 = 110 < 0 . Assuming that we x at 0.05 (95% con dence),
there are many possible tests. However, we want the test with the highest power
(smallest ).
The best estimator of  is X . Then the best test would be one based on X . If
X is suciently < , then H0 is rejected, else accepted. If H0 is true, most likely
the values of X will cluster around 0 . If H1 is true, most likely values of X will
cluster around 1 . Acceptance region: range of values of X for which H0 is accepted.
Rejection region: range of values of X for which H0 is rejected.
For the scores example, H1 :  = 1 < 0 (the mean under H1 is less than the
mean under H0 ) and hence the rejection region for rejection of H0 is when X is small.
(hence one tailed test). For a one tailed test, the value of the parameter being studied
(here ) in the alternative hypothesis H1 is either greater or lower than the value in
the null hypothesis (0 ), i.e. H1 :  < 0 or else of the form H1 :  > 0 , but not
H1 : 1 <  < 2 .
For this example, (H1 :  < 0 ), how small should X be for us to reject H0 ,
given (i.e. con dence interval)? Let H0 be rejected for all X < C , and be accepted
otherwise. This implies that C must be chosen such that typepI error = . Instead of
using X < C , switch to standard notation: t = (X 0 )=s= n, which would follow
a tn 1 distribution under H0 and hence P (t < tn 1; ) = .

Procedure: One sample test for µ: lower one sided test: To test H0 :  =
0 ;  unknown vs. H1 :  < 0 ;  unknown, using a signi cance level of ,
 Find the test statistic t
X p0
t=
s= n
and if t < tn 1; reject H0 , and if t  tn 1; , accept H0 .
 tn 1; is a critical value. For t < tn 1; H0 is rejected and for t  tn 1; , H0 is
accepted.
 Method 1: The critical value method of hypothesis testing depends on (type
I error). The level of used should depend on the relative importance of type
I and type II errors. For xed n, as increases, decreases and vice versa.
Usually = 0.05.
 Method 2: Instead of performing the critical value test at various (and ob-
serving whether H0 is accepted or rejected at each value), perform the test at
all values by obtaining the p value.
Hypothesis testing clearly indi-
p value: The p value is the level when no decision can be made between accept- cates p values which are indicative
ing H0 and rejecting H1 . This is the level at which t is the borderline between of when signi cance ends. However
acceptance and rejection. Hence there is indi erence with respect to H0 if t = tn 1;p sometimes statistical signi cance is
and p value = P (tn 1  t). Therefore the p value is indicative of the signi cance a result of large n.
level. A 95% con dence interval gives a
Range Implication range of values for  and is hence
0:01  p < 0:05 results are signi cant informative. It however does not
0:001  p < 0:01 results are highly signi cant reveal anything about signi cance
p < 0:001 results are very highly signi cant at a higher %.
p > 0:05 results are not statistically signi cant Hence both p values and a 95%
0:05 < p < 0:1 one may only consider trends con dence interval for  must be
computed. The medical commu-
nity usually uses CI = 95%.
28 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

Summary: One sample test for µ: lower one sided test


p
1. Critical value method: Compute t = (X 0 )=s= n, compare with tn 1; at H0 tn−−1
= 0:05. If H0 :  = 0 vs. H1 :  < 0 is being tested, and t < tn 1;0:05 ,
then H0 is rejected with a result which is statistically signi cant (p < 0:05).
Else H0 is accepted (P  0:05).

y2
2. p value method: Find the p value (P (tn 1  t)). Then H0 is rejected if p < 0:05 p 2 p 2
(statistically signi cant result), else H0 is accepted. This method is easier than
the critical value method and is slightly more informative because it gives an t 0
exact value of p (whereas the critical value method only gives an approximate
value of p).
Figure 2.5. The p value.
p value):
Scores example (critical H0 :  = p120 vs. H1 :  < 120. using = 0:05
then t = (X 0 )=s= n = (115 120)=24= 100 = -2.08. tn 1; = t99;0:05 = -
1.66. Since -2.08<-1.66 (t < tn 1; ), H0 is rejected at a signi cance level of 0.05. At
= 0:01, t99;0:01 = 2:36 and tn 1; < t and H0 should be accepted at signi cance
level of 0.01. The p value is P (tn 1 < t) and hence P (t99 < 2:08) = 0.020 (which is
a statistically signi cant result).
Scores example (modified): Suppose 10,000
p scores were measured p with mean =
119 and s = 24. Then t = (X 0 )=s= n = (119 200)=24= 10000 = -4.17.
P (tn 1 < 4:17)= P (t9999 < 4:17). But a t distribution with df = 9999 is ' N (0; 1)
and hence the p value is  ( 4:17) < 0:001 and hence this is a very highly signi cant
result. However, from a practical standpoint, X = 119 ' 0 = 120 and hence the
result is scienti cally insigni cant even though it is statistically signi cant. As statistics became increasingly
Conversely, statistically insigni cant results may become scienti cally signi cant, applied to policy making by politi-
with more sampling. cians in the late 1800's, a quote
(often attributed to Mark Twain)
2.2.3 One sample t test for the mean (upper one sided) which became increasingly popular
was: \There are three types of lies:
H0 : p= 0 vs. H1 :  > 0 ), with a signi cance level of . Using t = (X lies, damned lies, and statistics."
0 )=s= n,
If t > tn 1;1 then H0 is rejected.
If t  tn 1;1 then H0 is accepted.
The p value for this test is p = P (tn 1 > t).

2.2.4 One sample test for , unknown variance: two sided test
We have assumed a priori knowledge about sidedness: national scores have been
assumed to be greater than area scores. If H0 is untrue, we are unsure as to which
side of 0 the alternative mean may fall on. A two tailed test for the mean is required.
 under the alternative hypothesis is > or <  from H0 .
Scores example: H0 : area score = national score ( = 0 ). H1 :  6= 0 . The best
test depends on X (or t). We reject H0 if t is either too small or too large: reject H0
if t < C1 or if t > C2 , and accept H0 otherwise (C1  t  C2 ). C1 = tn 1; =2 and
C2 = tn 1;1 =2 .

p
Summary: One sample test for µ, unknown variance: two sided test H0 :
 = 0 vs. H1 :  6= 0 , with signi cance level . Compute t = (X 0 )=S= n.
If jtj > tn 1;1 =2 , reject H0 .
If jtj  tn 1; =2 accept H0 .
The p value may be computed as follows: For t  0, p = 2  P (tn 1  t) = twice left
hand tail area. For t > 0, p = 2  [1 P (tn 1  t)] = area to right of t + area to left
of t = twice right hand tail area.
For large n (> 30), the t distribution percentile tn 1;1 =2 may be approximated
with the corresponding percentile of N (0; 1) i.e. Z1 =2 . The p value may then be
computed from P (N (0; 1) < t) = (t).
2.2: Hypothesis testing CL688
c SBN, 2009 29

One-sided vs. two-sided tests Usually one sided tests are used: sample means
hopefully fall on one expected side of 0 . A two-sided test can always be used:  6= 0 Use the one sample t test if
is also implied by  < 0 . The two-sided approach is more conservative; we do not
have to guess the appropriate side. If only one-sidedness is expected, the one sided  there is one variable of inter-
test should be used: it has more power, and hence it is easier to reject H0 with nite est,
samples, if H1 is true. DO NOT change from two sided to one sided tests AFTER  the underlying distribution is
looking at data. normal or CLT is assumed to
Two sided one sample Z test t tests assume hold,
p that  is unknown. If  is known,
then t may be replaced by Z = (X 0 )== n, and critical values based on the tn 1  an inference concerning  is
distribution may be replaced with corresponding values of N (0; 1) distribution. required, and if
2.2.5 Power of a one sample test for    is NOT known.
This calculation needs to be done when planning a study. Usually data is not available
and at best a pilot study with a small sample size is performed.
Example: A 10 patient pilot study for a glaucoma drug is performed by measuring
intra ocular pressure (IOP). In the pilot study, the mean IOP decreases on using the
drug, by 5 mm Hg (standard deviation (SD) of 10 mm Hg). Are 100 people enough
for the real study?
Soln: Power = probability of declaring that the drug makes a di erence, with sample
size 100, if the true mean IOP drop is 5 mm Hg with SD = 10 mm Hg. If the
power turns out to be greater than 80%, the larger study may be performed. Since You want power to be at least 80%.
 = 10 mm Hg = known, the one sided Z test may be used with H0 :  = 0 vs.
H1 : 1 < 0 . Then, for a signi cance level of ; H0 is rejected if Z < Z . Notice
that this test does not depend on the alternative mean 1 as long as  < 0 .
The power of a test = 1 P (type II error) = 1 . Power = P (reject H0 jH0 is false). H0 H1
For a one sample, lower one sided test, power = P (Z < Z j = 1 )
   
X p0 
=P < Z  = 1 = P X < 0 + Z p  = 1
 y2
= n n
But under H1 , X  N (1 ;  2 ) and hence the power is
" p #  
0 + Z = n 1
p  1 p µ0 + Zασ n
 =  Z + 0 n (2.22)
= n 
The power is indicative of how likely a signi cant di erence would be found given H1 Figure 2.6. Power of a one sample,
is true. if the power is small, there is a small chance of nding a signi cant di erence upper one-sided test.
even if H1 is true (i.e., the true mean is in reality di erent from the null mean). Scores example: Assuming 1 =
For an upper one sided test, power = P (  > 0 +Z1 =pnj = 1 ) = 1 P (X <
X 115;  = 24 and = 0:05. Then
p
0 + Z1 = nj = 1 ) since 0 = 120, 1 = 115,  = 24,

 + Z1 p
p
= n 1
 
( 1 ) n
p  = 0:05 and n = 100, the power
=1  0 = 1  Z1 + 0 is
= n   
Using ( x) = 1 (x) and Z = Z1 gives (for 1 > 0 ) (120 115)
p   p   Z0:05 + p = (0:438)

(1 0 ) n (1 0 ) n 24= 100
 Z1 + =  Z + (2.23)
  =0.669. Hence there is only a 67%
Summary: power of a one sided one sample Z test: The Z test is used for chance of detecting a signi cant
the mean of a normal distribution with known variance. In general, H0 :  = 0 vs. di erence using a 5% signi cance
H1 :  = 1 . level, with sample size = 100.
p  p
Power =  Z + j0 1 j n= = ( Z1 + j0 1 j n= ) and hence the
power depends on , j0 1 j, n and  .
 As decreases, Z decreases and hence power decreases.
 As j1 0 j increases, the power increases.
 As  increases, the power decreases.
 As n increases, the power increases.
A power curve can be drawn through various 1 given = 0:05,  = 24, n = 100,
0 = 120.
30 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

Power of a two sided Z test: H1 :  = 1 6= 0 . Then H1 may


H0 :  = 0 vs. p
be rejected if (remembering that Z = (X 0 )== n)
 Z < Z =2 which implies that X < 0 + Z =2 =pn, or if
 Z > Z1  p
=2 which implies that X
> 0 + Z1 =2 = n.
p
The power = P (X < 0 + Z =2 = nj = 1 ) + P (X > 0 + Z1
p nj  =  )
=2 = 1

 p
0 + Z =2 = n 1
  p
0 + Z1 =2 = n 1

= p
= n
+1  p
= n

(  ) n
p   p 
(  ) n
=  Z =2 + 0 1
+ 1  Z1 =2 + 1 0
 
But 1 (x) = ( x) and hence the power is

(0 1 ) n
p  
(1 0 ) n
p 
 Z =2 + + Z1 =2 +
 
But Z =2 = Z1 =2 and therefore the power is

(0 1 ) n
p  
0 ) n (1
p 
 Z1 =2 + + Z1 =2 +
 
For 1 < 0 , the second curve is usually negligible, and for 1 > 0 , the rst term is
negligible. Therefore the power may be approximated by

j p 
0 1 j n
 Z1 =2 + (2.24)


2.2.6 Sample size determination


Scores example: H0 :  = 0 vs. H1 :  = 1 < 0 . Then H0 and H1 are assumed
to be normal distributions pwith  known. At a signi cance level , H0 is rejected
p if
Z < Z or X < 0 + Z = n and is accepted if Z  Z or X  0 + Z = n. The
investigator probably has an idea of 1 already. Assume that H1 is actually true.
Further, choose a value for power (1 ) like 80% or 90% (i.e. choose the probability
of rejecting H1 given that H1 is true). Then given a signi cance test at level , given
that the true alternative mean is 1 , what sample size is needed to detect a signi cant
di erence with probability 1 ? p p
Since H0 is rejected if X < 0 + Z = n, the area to the left of 0 + Z = n
must also be 1 . This can be done by making n suciently large. (As n increases,
the variance of each curve =  2 =n decreases,pand hence
 the curves separate).
The power p= 1 =  Z + j0 1 j n= . But (Z1 ) = 1 . Hence
Z + j0 1 j n= = Z1 and noting that Z = Z1 Various factors a ect n as follows:

n=
(Z1 + Z1 )2  2
(2.25)
 as  2 increases, n increases.
(0 1 )2  as decreases, Z1 in-
Notice the symmetry. This test holds for either a one sided upper or one sided lower creases, hence n increases.
test. Notice that n would depend (rather sensitively) on (0 1 )2 . Usually 0  as required power increases
is known, = 0:05, power > 80%. Appropriate values of 1 and  2 are usually (1 increases) then n in-
unknown. Some guesses may be made using prior knowledge. creases.
The term j0 1 j must be evaluated for scienti c signi cance. A pilot study
may be done (with small n) to get a feel for 1 and  2 , usually when the investigator  as j0 1 j increases, n de-
is `convinced' that H1 is right, and  = 1 and not  = 0 . (For example, when creases.
alternate evidence indicates that a drug should work).
Sample size determination for a two sided test For a two sided test with 
known, with desired signi cance level and a desired power 1 , from the power
of a two sided test,

1 =

Z1
j0 1 jpn  = (Z
=2 + )
 1
2.3: One sample test (2 test) for variance of a normal distribution (two sided) CL688
c SBN, 2009 31

which implies that


Z1
j0 1 jpn = Z
=2 +
 1

and hence
(Z1 + Z1 =2 )2  2
n= (2.26)
(0 1 )2
Note that n for a two sided test is greater than the n for a one sided test (because
Z1 =2 > Z1 ).

2.3 One sample test (2 test) for variance of a normal distribution
(two sided)
H0 : 2 = 02 vs. H1 : 2 6= 02 . If measurements X1 :::Xn are random samples, then
S 2 may be used as an unbiased estimator of 2 . If these measurements are from a
normal distribution N (;  2 ), then H0 implies that
X (n 1)S 2
Xi2 =
02
 2n 1
i
and therefore

P (X 2 < 2n 1; =2 ) = 2 = P (X
2
> 2n 11 ; =2 ) (2.27)
Hence H0 may be accepted for 2n 1; =2  X 2  2n 1;1 =2 and rejected otherwise
P
(Note that i Xi2 = (n 1)s2 =02 according to H0 ).
The p value for the two sided test depends on whether S 2  02 or whether S 2 > 02 .
If S 2  02 , then p value = twice area to left of X 2 under a 2n 1 distribution. If
S 2 > 02 , then p value = twice area to right of X 2 under a 2n 1 distribution.

2.4 One sample test for a binomial proportion


Using the normal theory approximation, consider the following example.

Example: The average prevalence of breast cancer is 2%. The average prevalence of
breast cancer in women whose mothers had breast cancer = 4/100 = 4%. Is the 4%
important with respect to the 2% (i.e. is there a hereditary aspect to breast cancer)?
Then if p is the prevalence rate of cancer patients whose mothers had cancer, we can
test H0 : p = 0:02 vs. H1 : p 6= 0:02.
We will use a sample proportion of cases p^. Assuming the normal approximation
to the binomial is valid (np0 q0  5) where p0 is the prevalence rate according to
H0 and q0 = 1 p0 . Then, under H0 , p^  p N (p0 ; p0 q0 =n). Standardizing p^ using
Z = (^p p0 )=standard error = (^p p0 )= p0 q0 =n gives Z  N (0; 1) under H0 .
Therefore,
P (Z < Z =2 ) = P (Z > Z1 =2 ) = =2
 H0 is rejected if Z < Z =2 or Z > Z1 =2 .
 H0 is accepted if Z =2  Z  Z1 =2 .
The p value of the test depends on whether p^ > p0 or p^  p0 . If p^ < p0 then p value
= 2  (Z ) If p^  p0 then p value = 2  (1 (Z ))

Power of a one sample, two sided binomial test: The hypotheses are usually
written as H0 : p = p0 vs. H1 : p 6= p0 . For a speci c alternative p1 , using the
normal theory approach where np0 q0  5, power =
r
p0 q0

jp p j n p 
 Z =2 + 0 p 1 (2.28)
p1 q1 p0 q0
32 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

Sample size for a one sample, two sided binomial test: H0 : p = p0 vs.
H1 : p 6= p0 . Then the sample size n should be
h p i2
p0 q0 Z1 =2 + Z1 (p1 q1 )=(p0 q0 )
n= (2.29)
(p1 p0 )2
One sided one sample binomial tests: Replace =2 by in the above two equa-
tions (for sample size and power).

2.5 Two sample inferences


The parameters of two di erent distributions are compared. No speci c values are
assumed for the parameters of the alternate hypothesis. In clinical trials this usually
occurs during
 longitudinal studies. Example: follow up of the same patient over time. Hence
samples will have to be paired by patient.

cross-sectional studies where patients are probably seen only once. This requires
independent sample design.
Paired sample analysis is required to prevent confounding (a situation where other
unaccounted for factors in uence the analysis). Independent sample analysis involves
some confounding, but is usually cheaper to carry out in the context of clinical trials.

2.5.1 The paired t test for dependent samples


The paired t test may be used when
 the problem is a two sample problem,
 the random variable has an underlying normal distribution, or CLT is assumed
i xi1 xi2 di
to hold,
1 115 128 13
 inferences about means are required, 2 112 115 8
 and the samples are not independent. 3 107 106 -1
4 119 128 9
BP example: Consider the blood pressures of 10 people, before and after using a 5 115 122 7
drug. We assume that the BP(pre) of the ith patient is a variable xi1 which is normally 6 138 145 7
distributed with mean i and  2 . Then after the drug has been taken we can assume 7 126 132 6
that the BP(post) of the ith patient (xi2 ) also follows a normal distribution, but with 8 105 109 4
a shifted mean: N (i +;  2 ). Let di = xi2 xi1 . Obviously, if  = 0, then it implies 9 104 102 -2
that the drug has no e ect. However, the i are unknown and could be di erent for 10 115 117 2
di erent people. Note that di is expected to be normally distributed with a mean of
 and variance d2 , and is not dependent on i . Hence the one sample P t test should
be based on di . The best estimator of the di erence  will be d = di =n (similar
to how X is the best estimator of ).
Summary of the paired t test  (Sd =pn) where d = Pni=1 (x2i x1i )=n =
Let t = d=
P
di =n and s
q  Pn P
 i=1 di
( di )2 =n
2
Sd = E (di d)2 = (2.30)
n 1
n is the number of matched pairs. The subscript d in Sd denotes that we are using a
di erence variable.
 H0 is rejected if t > tn 1;1 =2 or t < tn 1; =2 . Remember that tn 1; =2 =
tn 1;1 =2 .
 H0 is accepted if tn 1;1 =2  t  tn 1;1 =2 .
 d = p n)
 The p value of a paired t test is (t = d=S
– for t < 0, p = 2 area to left of t.
– for t  0, p = 2 area to the right of t.
2.5: Two sample inferences CL688
c SBN, 2009 33

BP Example (contd.): Using = 0:05, d = 4:80, Sd2 = 20.844 and hence Sd =


4:566,
4:80
t= p = 3:32
(4:566= 10)
Using a tn 1 = t9 distribution, t9;0:975 = 2.262 and so t > tn 1;1 =2 and H0 may be
rejected with = 0:05. For the p value, note that t9;0:9995 = 4.781, t9;0:995 = 3.250 and
hence 3.32 is in between. Hence 0:0005 < p=2 < 0:005 and therefore 0:001 < p < 0:01.
The exact value of p is 0.008874337.
Interval estimation for comparison of means of two paired samples
The observed di erence scores (di ) are normally distributed with mean  and variance
d2 . Hence the sample mean di erence d must be normally distributed with mean 
and variance d2 =n (d2 = unknown). A two sided 100%  (1 ) con dence interval
for  is  
Sn Sd
d tn 1;1 =2 p ; d + tn 1;1 =2 p
  (2.31)
n n
For the BP example, d = 4:80 mm Hg, Sd = 4.566 mm Hg, n = 10. For = 0:05,
the interval is 4:80  3:27 = [1:5; 8:1] mm Hg.

2.5.2 The two sample t test for independent samples, equal


variances
Usually for cross-sectional studies instead of longitudinal data.
Example: 8 BP drug users have a mean BP of 132.86 mm Hg and a sample SD =
15.34 mm Hg. 21 non-drug users have a mean BP of 127.44 and a sample SD of 18.23
mm. Does the drug have any e ect?
We assume that both sets of participants have BPs following normal distributions.
Set 1 is N (1 ; 12 ) and set 2 is N (2 ; 22 ). Then H0 : 1 = 2 vs. H1 : 1 6= 2 . We
assume that the underlying variances are the same: 12 = 22 =  2 . From the sample
data, we have X1 ; X2 , S12 and S22 .
We evaluate the di erences between the two sample means = X1 X2 . If this
di erence is far from 0, we can reject H0 . The two sample means are themselves
normally distributed: X1  N (1 ;  2 =n1 ) and X2  N (2 ;  2 =n2 ). Since X1 and X2
are independent random variables, then X1 X2 is a normal distribution with mean
1 2 and variance 2 (1=n1 + 1=n2 ) To subtract two distributions, sub-
  tract the means and add up the
1 1 variances.
X1 X2  N 1 2 ; 2 ( + ) (2.32)
n1 n2
Under H0 , 1 = 2 and hence

X1 X2  N 0; 2 (1=n1 + 1=n2 ) (2.33)
p
If  2 were known, then dividing X1 X2 by  1=n1 + 1=n2 would give i.e. we have used z = (x )= .
X1 X2
p  N (0; 1) (2.34)
 1=n1 + 1=n2
However,  2 is usually unknown and must be estimated. We can use S12 and S22 to
determine  2 , but not in the form of a measure like (S12 + S22 )=2 because sample sizes
for the two sets may be di erent. Noting that sample variances with more samples are
probably more precise, we should weight each individual sample variance accordingly.
The best estimate of  2 is the weighted average S 2 where
( n1 1)S12 + (n2 1)S22
S2 = (2.35)
n1 + n2 2
Note that the weights used are the degrees of freedom. Hence we should be using a t
distribution with dftotal = n1 + n2 2 rather than N (0; 1).
34 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

Summary: two sample t test, independent samples, equal variances H0 :


1 = 2 vs. H1 : 1 6= 2 . Signi cance level . 2 is assumed to be the same for
both populations. Compute
s
X X2 (n1 1)S12 + (n2 1)S22
t= p 1 S=
S 1=n1 + 1=n2 n1 + n2 2
 Reject H0 if t > tn1 +n2 2;1 =2; or if t < tn1 +n2 2;1 =2 .
 Accept H0 if tn1 +n2 2;1 =2;  t  tn1 +n2 2;1 =2;
 The p value may be computed as follows: Find t and S using the equations
above. If t  0 then p = 2 area to left of t in a tn1 +n2 2 distribution. If t > 0
then p = 2 area to the right of t in a tn1 +n2 2 distribution.

Interval estimation for comparison of means from two independent sam-


ples, with equal variances: To determine the 100%(1 ) CI for the true mean

di erence 1 p 2 , if  is known, then X1 X2  N 1 2 ;  2 (1=n1 + 1=n2 ) or
(X1 X2 )= (1=n1 + 1=n2 )  N (0; 1). If  is unknown, then  is estimated by S .
(X1 X2 ) (1 2 )
p  tn1 +n2 2
S 1=n1 + 1=n2
Then for a two sided CI of 100%(1 ),
!
(X1 X2 ) (1 2 )
P tn1 +n2 ; =2
21  p  tn1 +n2 ; =2
21 =1
S 1=n1 + 1=n2
This implies that
 
tn1 +n2 ; =2
21  (X1 pX2 ) (1 2 )
S 1=n1 + 1=n2
(X1 X2 ) (1 2 )
p  tn1 +n2 21 ; =2
S 1=n1 + 1=n2
which can be rearranged to give
r
1 1
1 2  (X1 X2 ) + tn1 +n2 ; =2 S + ;
21
n1 n2
r
1 1
(X1 X2 ) tn1 +n2 ;
21 =2 S
n1
+
n2
 1 2
which implies a con dence interval of
 r r 
1 1 1 1
(X1 X2 ) tn1 +n2 2;1 =2 S + ; (X1 X2 ) + tn1 +n2 2;1 =2 S +
n1 n2 n1 n2
(2.36)

2.5.3 Testing for equality of two variances


H 0 : 12 = 22 vs. H1 : 12 6= 22 where two samples are independent, random samples
from two normal distributions N (1 ; 12 ) and N (2 ; 22 ). If the sample variances are
S12 and S22 , then if S12 looks di erent from S22 we might expect unequal variances.
Then the best test for equality of variances is based on S12 =S22 (and not S12 S22 ). We
reject H0 if S12 =S22 is too small or too large.
Note that there must be a sampling distribution for S12 =S22 under H0 : 12 = 22 . Every sampling variable has its
This sampling distribution is called the F distribution and is actually a family of sampling distribution.
curves depending on the numerator and denominator degrees of freedom = Fn1 1;n2 1
and is usually positively skewed. If n1 = 2 or 3 (i.e. n1 1 = 1 or 2), then there is
2.5: Two sample inferences CL688
c SBN, 2009 35

a mode at 0, else mode is > 0. The 100%pth percentile of an F distribution with d1


and d2 degrees of freedom = Fd1 ;d2 ;p , i.e.
P (Fd1 d2  Fd1 ;d2 ;p ) = p
;
(2.37)
The F distribution has symmetry in a sense: Under H0 , S22 =S12 should follow a
Fd2 ;d1 distribution. Then P (S22 =S12  Fd2 ;d1 ;1 p ) = p and hence
P (S12 =S22  1=Fd2 d1 ;1 p ) = p
;

Under H0 , S12 =S22 follows a Fd1 ;d2 distribution and hence


P (S12 =S22  Fd1 ;d2 ;p ) = p
which implies
1
Fd1 d2 ;p = (2.38)
;
Fd2 ;d1 1 ; p
which is useful if the F table reports only some values but not one of your interest. For a two sided F test, it does not
make a di erence which sample is
The F test: H 0 : 12 = 22 vs. H1 : 12 6= 22 . The variance ratio S12 =S22 under H0 chosen for the numerator. Since
should follow an F distribution with d1 = n1 1 and d2 = n2 1. A two sided test most tables report a variance ratio
is needed to reject H0 for small and large values of S12 =S22 . For a signi cance level , > 1, preferably use the larger vari-
 reject H0 if F > Fn1 1;n2 1;1 =2 or F < Fn1 1;n2 1; =2 . ance in the numerator. Else advan-
tage may be taken of the F distri-
 accept H0 if Fn1 1;n2 1; =2  F  Fn1 1;n2 1;1 =2 . bution symmetry described on the
left.
The p value method for the F test for equality of two variances Compute
F = S12 =S22 . then using a Fn1 1;n2 1 distribution,
 if F  1, p = 2  P (Fn1 1;n2 1 > F)
 if F < 1, p = 2  P (Fn1 1;n2 1 < F)

2.5.4 Two sample t test, independent samples, unequal vari-


ances
If the t test indicates H0 is OK, then 12 = 22 indicates that you can use the two Set 1: n1 points, N (1 ; 12 )
sample t test with equal variances, described before. Given two sample sets, for the Set 2: n2 points, N (2 ; 22 ).
situation where 12 6= 22 according to the F test, we ask if 1 = 2 .
To test H0 : 1 = 2 vs. H1 : 1 6=2 we evaluate X1 X2 . Under either Also called the Behrens-Fisher
hypothesis, we assume that X1 is normally distributed with mean = 1 and variance problem.
12 =n1 and that X2 is normally distributed with mean = 2 and variance 22 =n2 . Then
 
 12 22
X1 X2  N 1 2 ; + (2.39)
n1 n2
Then under H0 where 1 = 2 , we get
 
 12 22
X1 X2  N 0; +
n1 n2
If 12 and 22 are known, we can use

X1 X2
Z= q   (2.40)
12 =n1 + 22 =n2
Under H0 this would be N (0; 1).
If 12 and 22 are unknown, then they may be estimated from S12 and S22 . A pooled
variance cannot be computed as before, because 12 6= 22 . Hence we try

X1 X2
t= q   (2.41)
S12 =n1 + S22 =n2
The exact distribution of t under H0 is dicult to get. Satterthwaite's method of
handling this situation is listed below.
36 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

Summary: two sample t test, independent samples, unequal variances:


(Satterthwaite’s method): To test H0 : 1 = 2 vs. H1 : 1 6=2 , compute

X1 X2
t= q  
S12 =n1 + S22 =n2
0
Compute d = approximate degrees of freedom

0 S12 =n1 + S22 =n2 2
d = h  i h  i (2.42)
S12 =n1 2 =(n1 1) + S22 =n2 2 =(n2 1)
0 00
Round d down to the nearest integer = d . Reject H0 if t > td00 ;1 =2 or t <
td00 ;1 =2 . Accept H0 if td00 ;1 =2 < t < td00 ;1 =2 .
The p value may be computed from t and a td00 distribution.
 If t  0 then p = 2 area to the left of t.
 If t > 0 then p = 2 area to the right of t.
Two sided con dence interval for 1 2 (12 =6 22 ) is
2 s s 3
1 S12 S22  S12 S22 5
4X X2 td00 ;1 =2 + ; X X2 + td00 ;1 =2 + (2.43)
n1 n2 1 n1 n2

Estimation of sample size for comparing two means:


 Sample size needed for comparing means when two samples are of equal size,
using a two sided test, signi cance level , power = 1 :
The sample size n of each group is (using  = j2 1 j)
(12 + 22 )(Z1 =2 + Z1 )2
n= (2.44)
2
The two groups will be N (1 ; 12 ) and N (2 ; 22 ).
 Sample size needed for comparing means when two samples are of unequal size,
using a two sided test, signi cance level , power 1 :
For the two groups, using  = j1 2 j and assuming the two groups to be For k = 1, this should simplify to
 N (1 ; 12 ) and N (2 ; 22 ), the equation above for samples of
equal sizes.
(12 + 22 =k)(Z1 =2 + Z1 )2
n1 =
2
(k1 + 2 )(Z1 =2 + Z1 )2
2 2
n2 = (2.45)
2
where k = n2 =n1 = projected ratio of the two sample sizes.
Estimation of power: Comparison of means of two samples, two sided test, signi -
cance level . To test H0 : 1 = 2 vs. H1 : 1 6=2 , and in particular j1 2 j = .
Power is pn  !
 Z1 =2 + p 2 1 2 (2.46)
1 + 2 =k
where the two groups are assumed to be  N (1 ; 12 ) and N (2 ; 22 ) and k = n2 =n1
is the projected ratio of sample sizes.

2.6 Non-parametric methods


These must be used when the underlying distribution is not apparent, or when the
sample size is too small (i.e. CLT may not hold). They are useful in handling ordinal
data (grades etc.), but non parametric methods can also be used for cardinal data to
evaluate if the normality of the underlying distribution is doubtful.
2.6: Non-parametric methods CL688
c SBN, 2009 37

2.6.1 Sign test


For two subjects A and B evaluate whether score for A is >, =, < score for B. Relative
magnitude of di erences may be unknown. Two methods: the normal theory method,
and the exact method.
Normal theory method
Let xi = score for A and yi = score for B for the i th sample. Then di = xi yi .
H0 :  = 0 vs. H1 :  6= 0
where  = population median of di = 50th percentile of underlying distribution of
di . Calculate The actual underlying distribution
c (scores when di > 0) of di may not/cannot be observed:
=
n (scores when di 6= 0) just di > 0, di = 0, or di < 0.
If c is large, score A > score B. Under H0 , p(nonzero di > 0) = 1=2. Assuming that If a high score is good, A is pre-
the normal approximation to binomial is valid (npq  5) gives ferred.
 
1 1
n 1
2 2
 5 or n  20 (2.47)

where n = number of non zero di 's.


Summary of two sided α level test = sign test

H0 :  = 0 vs H1 :  6= 0
If the number of nonzero di = n  20, and the number of di 's when di > 0 is c, then
reject H0 if
r r
n
1 n n 1 n
c > c2 = + + z1 =2 c < c1 = z1 =2
2 2 4 2 2 4
The p value for the sign test Normal theory method
" ( )#
c (n=2) (1=2)
p=2 1  p if c  n=2
n=4
" ( )#
c (n=2) + (1=2)
p=2 1  p if c  n=2
n=4
Alternative formulae for p value:
  
p=2 1 
j C pD j 1
n
where c = number of di > 0, D = number of di < 0. This is called the sign test because
The sign test is a special case of the 1- sample binomial test: it is only looking for the sign.
H0 : p = 1=2 vs H1 : p 6= 1=2 We assume a a large number of
samples.
Assuming that the normal approximation to the binomial is valid, under H0 : p = 1=2,
E [c] = np = n=2 and var(c) = npq = n=4 ) c  N (n=2; n=4).
Example Two ointments preventing sunburn are to be evaluated. Ointment A is
applied on one arm, B on other, and we measure redness for 45 people. 22 are better
o on arm A, 18 on arm B, 5 equally well o . ) there are 40 untied pairs and
c = 18 < n=2 = 20 z0:975 = 1:96
r r
1n n 40 1 40
c2 = + +z = + + z0:975 = 26:7
2 2 1 =2 4 2 2 4
r
n 1 n
c1 = z = 13:3:
2 2 1 =2 4
38 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

13:3  c = 18  26:7
H0 is accepted using a two sided
(18 20 + 1=2) test, = 0:05.
p = 2  [ ] = 2  ( 0:47) = 2  0:3176 = 0:635
40=4
) not statistically signi cant and hence both ointments are equally e ective.
Other method:

z=
j C pD j 1 = j 18 p22 j 1 = p3 = 0:47
n 40 40
) p = 2  [1 (0:47)] = 0:635
P
If c > n2 , p = 2  P nk=c n Ck (1=2)n
Sign test: Exact method
This is a special case of one sample binomial tests, where for small samples, H0 : p = If c < n2 , p = 2  ck=0 n Ck (1=2)n
1=2 vs. H1 : p 6= 1=2 If n < 20 we need to use exact binomial probabilities rather If c = n=2, p = 1:0.
than the normal approximation.

2.6.2 Wilcoxon signed rank test


The signed rank test is useful if
Example: Two ointments A and B are available to treat burns. The degree of burn data is quanti ed on a scale (eg 10
is quanti ed on a 10 point scale, 10 = worst burn, 1 = no burn at all. You apply one point scale like grades).
ointment on each arm of patient i and evaluate. xi = degree of burn for ointment A,
yi = degree of burn for ointment B and di = xi yi . Hence if di > 0, B is better
than ointment A. If di = +5; redness with ointment A is > redness with ointment B.
Negative Positive No. of people with Range Average
j di j di fi di fi same absolute values of ranks rank
10 -10 0 10 0 0 - -
9 -9 0 9 0 0 - -
8 -8 1 8 0 1 40 40.0
7 -7 3 7 0 3 37-39 38.0
6 -6 2 6 0 2 35-36 35.5
5 -5 2 5 0 2 33-34 33.5
4 -4 1 4 0 32 32.0
3 -3 5 3 2 7 25-31 28.0
2 -2 4 2 6 10 15-24 19.5
1 -1 4 1 10 14 1-14 7.5
22 18
0 0 5
The number of people with di > 0 is 18 (they nd B better). The number of
patients with di < 0 is 22 (they nd A better). But negative di have much greater
absolute values than positive di )
H0 :  = 0 vs H1 :  6= 0 probably ointment A is better than
B.
where  = median score di erence between the two arms. If  < 0 ointment A is
better. Doing a paired t test on table above is not OK because the rating scale is
ordinal. We need nonparametric test analogous to paired t test.
Ranking procedure for Wilcoxon signed rank test First arrange di erences
di in order of absolute value. Then count the number of di erences with the same
absolute value. Ignore observations with di = 0. Rank the observations remaining,
from 1 to n based on their absolute values. Let R = highest rank used prior to this ) n is the highest absolute value.
group. Let G = number of occurrences (di erences) in this group. Then the lowest
rank in the range is 1 + R and the highest rank in the range is G + R. The average
rank in the group is (lowest rank + highest rank)/2.
The sum of ranks = rank sum = R1 for group of people with positive di . If the di > 0 implies that ointment A
null hypothesis is true does worse than ointment B

E [R1 ] = n(n4+ 1) ; Var [R1 ] =


n(n + 1)(2n + 1)
24
(2.48)
If n  16, the normal approximation may be used. n = number non zero di erences.
2.6: Non-parametric methods CL688
c SBN, 2009 39

Test procedure: normal approx., two sided, significance level α Rank the
di erences using the procedure provided above. Compute the rank sum R1 of the
position di erences. If there are no ties: i.e. no group of di erences with same
absolute value h i
j R1 n(n+1)
4
j 1
2
T= q
n(n+1)(2n+1)
24

If there are ties, let ti refer to the number of di erences with same absolute value in Use this procedure only if n  16.
the ith tied group and g = number of tied groups. The di erence scores are assumed
h i to have an undergoing continuous
j R1 n(n+1)
4
j 1
2 symmetric distribution.
T= q  g  (2.49)
[n(n + 1)(2n + 1)=24] i=1 (t3i ti )=48
If T > z1 =2 then reject H0 . The p value for the test is = 2  (1 (T )).
Burns example: En = number of non zero di erences = 22 + 18 = 40  16 ) we
can use the normal approximation. Compute the rank sum for people with di > 0:
R1 = 10(7:5) + 6(19:5) + 2(28:0) = 248
and E [R1 ] = 40  41=4 = 410 and Var [R1 ] is
40  41  81 (143 14) + (103 10) + (73 7)
Var [R1 ] =
24 48
(2 2) + (2 2) + (3 3)
3 3 3

48
= 5535 4092=49 = 5449:75
p
sd(R1 ) = 5449:75 = 73:82
and hence
[j 248 410 j 12 ] 161:5
T= = = 2:19
73:82 73:82
p value of the test= 2[1 (2:19)] = 2  [1 0:9857] = 0:029. The observed rank sum Remember that the sign test did
(248) is smaller than the expected rank sum (410), ) ointment A does better than not report any signi cant di er-
ointment B. ence!
If the test is performed on negative di erence scores with R2 = rank sum of
negative di erences, then It does not matter if you focus on
R1 or R2 .
R2 = 4(7:5) + 4(19:5) + 5(28:0) + 1(32:0) + 2(33:5) + 2(33:5) + 3(38:0) + 1(40:0)
= 572

jR2 n(n4+ 1) j 0:5 = j572 410 j 0:5 = 161:5 =j R1


4
n(n + 1)
j 0:5
Var [R1 ] = Var [R2 ] ) same test statistic T (= 2:19) and p values (0.029) are obtained.

2.6.3 Wilcoxon rank sum test or Mann-Whitney test or U test


Wilcoxon signed rank test is the
For rank sum test with independent samples we need a hypothesis test of medians. nonparametric equivalent of the
paired t test.
Example: Eyesight in people with dominant (DOM) and sex linked (SL) retinitis Wilcoxon rank sum test is the non-
pigmentosa. The t test is inapplicable because visual acuity is not a speci c numerical parametric equivalent of the t test
value. for 2 independent samples.
Visual acuity DOM SL combined sample range of ranks Average ranks
20-20 5 1 6 1-6 3.5
20-25 9 5 14 7-20 13.5
20-30 6 4 10 21-30 25.5
20-40 3 4 7 31-37 24.0
20-50 2 8 10 38-47 42.5
20-60 0 5 5 48-52 50.0
20-70 0 2 2 53-54 53.5
20-80 0 1 1 55 55.0
25 30 55
40 CL688
c SBN, 2009 2 | Estimation and Hypothesis Testing

Ranking procedure for Wilcoxon rank sum test

 Combine data from the two groups. Order values from lowest to highest (or from
best to worst). Then assign ranks (best = low rank etc.). Compute the range
of ranks for each group, and then assign the average rank for every observation.
 The test statistic = Rank sum in rst sample = R1 . If R1 is large, dominant group has
poor eyesight.
 If the number of observations in the two groups are n1 and n2 , the average rank
of combined sample = (1 + n1 + n2 )=2. Then under H0 : E [R1 ] = n1  average
rank of combined sample

E [R] = n1 1 + n21 + n2 Var [R] = n1 n2


1 + n1 + n2
12
(2.50)

 Assume that smaller group is of size at least 10 and that the variable under
study has an underlying continuous distribution ) R1  Normal.
Test procedure: normal approximation: two sided, level α Rank all observa-
tion as discussed above. Then compute rank sum R1 in the rst sample. If there are The choice of R1 is arbitrary.
no ties compute  
T=
j R1 n1 (n1 + n2 + 1)=2 j 12
p
(n1 n2 )(n1 + n2 + 1)=12
and if there are any ties, compute Use the test only if n1 ; n2 > 10
  Pg and if there is an underlying con-
T=
j R1 n1 (n1 + n2 + 1)=2 j
p
1
2
S= i=1 ti (ti
2
1)
(2.51) tinuous distribution.
(n1 n2 )(n1 + n2 + 1 S )=12 (n1 + n2 )(n1 + n2 1)
where g is number of tied groups and ti = number of observations in ith group. If
T > z1 =2 we reject H0 . Compute exact p value by p = 2  [1 (T )].
Example: Since the minimum sample size = 25  10, we use the normal approxi-
mation.
R1 = 5(3:5) + 9(13:5) + 6(25:5) + 3(34) + 2(42:5) = 479
E [R1 ] = 25  26=2 = 700 and Var [R1 ] = corrected for ties is R1 < E [R1 ]!
    
25  30 A 5387
= 56 = 62:5 56 = 3386:74
12 55  54 2970
where
A = [6(62 1)+14(142 1)+10(102 1)+7(72 1)+10(102 1)+5(52 1)+2(22 1)]
(j 479 700 j 0:5)
T=p = 3:79 )
3386:74
p = 2  [1 (3:79)] < 0:001
The variation in visual acuity is signi cantly di erent.
Comments
 If n1 or n2 < 10, use a small table of exact signi cant levels.
 H test or Kruskal-Wallis test = generalization of U test to see if K independent
samples come from identical populations.

Potrebbero piacerti anche