Sei sulla pagina 1di 10

Testing Hypotheses

1. How to use a sample to decide whether a population possesses a


particular characteristic?

2. To determine how likely/unlikely is that a particular sample has


come from a particular population.

All managerial decisions are made on certain beliefs.

We believe that an area consists of higher income groups. We decide our


marketing strategy based on this.

We believe that our process does not produce more than ---- defects per
batch. We plan our r/m requirements and delivery dates based on this
assumption.

We believe that the attrition rate in our company is not more than say 15%.
We plan our recruitment schedule accordingly.

We believe that our collections this month will be around 40% of our out
standings. We plan our cash flow based on this.

These are all assumptions or beliefs or premises based on which we take


our decisions.

Such a premise is called a hypothesis. How do we know whether our


assumption is correct? A Statement about a population parameter that has to
be checked is called a hypothesis.

Suppose we take a sample and find the sample statistic. If this is far below
or far above our assumption, then we might conclude that our hypothesis
was incorrect. But what if the difference between the observed value of say
sample mean and the assumed population parameter is not much? We
cannot be absolutely certain.
In such cases. We can neither reject nor accept the hypothesis. Instead, the
decision making process has to be objective based on the information
provided by the sample.
We cannot jump to conclusions based on one sample alone. students’ guide
example. We also have to ensure that our decision making is aimed at
situations which are under comparable conditions. ‘Lawn mower example’.

Concepts of hypothesis testing: Al sheet example. µ = 0.04;


σ = 0.004; n = 100; x bar = 0.0408;
Probability of getting a sample with a mean that is 2 standard errors from
the mean is only 4.5%. If this probability is considered very low, then we
might come to the conclusion that the originally assumed value of mean,
i.e.,.04 is not correct and we may end up rejecting the consignment. Our
reason for rejection would have been: the difference between the assumed
population mean and the sample mean is very large and hence the chances
that this population would produce such a sample is very low.

How low is low? To define this, one should understand that our minimum
standard for an acceptable probability is also the risk that we take of
rejecting a good hypothesis/consignment. Depending on various situations,
this value of acceptable levels of probability (α ) can assume different
values. If we want to reduce the probability
of rejecting a true hypothesis, we will go for a low value α. In the above
example if the acceptable level of probability is say 2 %, then we would
have accepted the above consignment.

Null and Alternative Hypotheses.

Null hypothesis is the assumption that we want to test and is denoted by H0.

For every null hypothesis that we define there has to be an alternate


hypothesis such that when the null hypothesis is not true, then, the alternate
hypothesis should be true denoted by H1.
µH0 is the hypothesized value of the mean. If our sample does not support
this value, then we reject the null hypothesis and accept the alternate
hypothesis H1 to be true.
For H0: µH0 = .04, alternate hypothesis could be
H1 : µ ≠ .04 or H1 : µ < .04 or H1 : µ > .04

Purpose of hypothesis testing is not to question the computed value of the


sample statistic but to make a judgement about the difference between the
sample statistic and the assumed population parameter.
Significance levels: criterion to accept or reject a null hypothesis. In the
above problem, we thought that P( x bar = µ ± 2 σ ) which was 0.045 was
very low and hence we rejected the original assumption about the
population parameter. This 4.5% is called the significance level. Suppose
we want to test the above hypothesis at a significance level of 5%; what do
we mean? We will reject the null hypothesis if the difference between x bar
and µ is so large that, it or a larger difference would occur, on an average,
only 5 or fewer times in every 100 samples when the null hypothesis is
correct.

If we assume that the null hypothesis is correct, then the significance level
will indicate the percentage of sample means that is outside certain limits.
At 5% significance levels, there is a 95% area under the normal curve
where there is no significant difference between x bar and µ and in the
balance 5% area, there is a significant difference between x bar and µ.
Hence if the sample mean falls under the 95% area, we would accept the
null hypothesis while H0 will be rejected if the sample mean were to fall in
the 5% area.
We should understand that when we accept the null hypothesis, we do not
say that the population mean will assume the value of the hypothesized
mean; we only say that there is no statistical evidence to prove that H0 is
wrong. So when sample data does not give us reasons to reject a null
hypothesis, we accept it.

Choice of a significance level.

How do we choose α? As α is the probability that the sample mean


takes a value beyond some acceptable level, we know that it is the
probability of rejecting a null hypothesis when it is true.
Hence while a very small value of α like say .5% (.005)will mean that our
chances of rejecting a null hypothesis when it is true is only .5%, higher
values of α like say 10% (0.1)will mean that chances of we rejecting a true
null hypothesis will be higher at 10%.Generally a value of 0.05 is used
while in special cases 0.1 and 0.01 are also used. For different values of α
we will see that our decision about accepting or rejecting a null hypothesis
will change.

Type 1 and Type 2 errors.


Error committed in rejecting a true null hypothesis is called a Type 1
Error and the probability of committing a Type 1 error is α.
Similarly the error committed in accepting a hypothesis that is not true is
called a Type 2 error and the probability of committing this error is
generally termed as β. Obviously for higher values of α, β will assume
lower values and vice versa. Hence there is a tradeoff between these two
errors. Α value is chosen depending on whether one wants to avoid a
Type1 error or Type 2 error.
For example, a low value of α like say 0.01 will be chosen when one does
not want to commit a Type 1 error. M/C ENGINE ASSY EXAMPLE.

Similarly, a high value for α like say 0.1 will be chosen if one does not
want to commit a Type 2 error i.e., when one does not want to accept the null
hypothesis when it is false. Chemicals in Drugs example.
Hence situations decide the value of α.

What distributions to choose?

NORMAL when n >30

‘t’ distribution when n < 30, σ is not known and the parent distribution is’
known to be approximately normal. Also, when n > 5% of N, finite
population correction factor is to be used while calculating the standard
error of the sampling distribution.

One tailed and two tailed tests.

There are times when we do not want the sample statistic to take a value
which is lower than the assumed value. In such cases we will formulate the
alternate hypothesis as
H1 : µ < µH0. ex : weight of items packed in a container sold by weight.

Similarly in case we do not want the sample statistic to take a value higher
than the assumed value, the alternate hypothesis will take the form
H1 : µ > µH0. ex : no of defects in a batch

In case we do not want the sample statistic to be neither too high nor too
low as compared to the assumed value, then the alternate hypothesis will
take the form H1 : µ ≠ µH0 ex : diameter of piston in engine assy.

In the first two cases, the significance levels correspond to only one side of
the normal curve about the mean, while in the third case the significance
levels correspond to both the sides of the normal curve about the mean.
Hence in the first case we will reject the null hypothesis only if the sample
statistic is in the left tail of the normal curve ( shaded area ) and in the
second case, we will reject H0 only if the sample statistic is in the right tail
of the normal curve ( shaded area ). These kinds of tests are called one
tailed tests. The 3rd case where H0 will be rejected if the sample statistic
falls in either of the tails ( shaded area ), is called a two tailed test.

We have to decide on the kind of test that we have to perform to get the
desired result based on the situation as referred above.

This has to be decided prior to the sampling process and not after looking
at the sample values as this would lead to erroneous conclusions.

Hypothesis Testing of Means :

Steps for conducting a hypothesis test:


1. Formulate the Null and Alternate hypothesis after deciding whether to
conduct a one tailed test or two tailed test.
2. Select an appropriate significance level.
3. Decide on the distribution to use and mark the critical region.
4. Calculate the standard error of the sample statistic and convert the
observed value of the sample statistic to a standard score.
5. Locate the standardized sample score on the sketched distribution.
6. Accept or Reject the null hypothesis based on the above.

Two Tailed Tests: Given assumed mean is 80000; standard deviation of the
sample is 4000; mean of a sample of size 100 is 79600; α= 0.05
To check whether the assumption is correct.
When we accept a null hypothesis, what we mean is that the sample mean
observed is not significantly far away from the hypothesized value of the
population mean and hence the observed sample could have come from the
population under study. Similarly, rejecting a null hypothesis means that the
observed value of a sample mean is significantly far away from the
hypothesized mean of the population and hence the sample chosen could
not have come from a population with the assumed value of the population
mean.

One Tailed Tests: Drug Dosage example and Defects in a batch


example.

Measuring the power of a test:


What do you expect a good hypothesis test to do?
To minimize both Type 1 and Type 2 errors. But we know that there is a
trade off between these two errors. We keep α low if we want to minimize
Type1 error, but this might result in the probability of committing a Type2
error, β, assuming a higher value.
Ideally we want to keep both α and β low. But once we choose a particular
value of α, the probability of Type1 error occurring is fixed. There is
nothing much we can do about it.
What about β? By definition, β is the probability of accepting a null
hypothesis when it is not true. So (1 – β ) is the probability that we will not
accept a false hypothesis. A good test is one which maximizes this
probability and hence we call (1 – β ) as the power of a hypothesis test.
So a high value of 1- β would ensure that the test would reject a false null
hypothesis and a low value for 1 – β could mean that many false
hypotheses will be accepted and hence, may be the test is not working
at all !! It is because of this reason that 1-β is called the power of a
hypothesis test.

If we plot the values for 1-β for each value of µ for which H1 is true, we
get a curve called the Power Curve.

We will note that as the actual population mean gets closer to the original
hypothesized mean, the power of the test reduces. In fact when the actual
population mean is exactly equal to the Hypothesised mean, we will find
that 1 – β equals α, which is what we expect. (rejecting a true hypothesis).

So, if the sample is very bad or unsatisfactory, our test is good; but as the
sample is getting better and better, the power of the test reduces and the test
is becoming ‘not so good’ after all! This uncertainty is the cost that we have
to pay for sampling and its associated errors. It is because of these errors
that hypothesis tests do not perform perfectly.

Hypothesis tests of proportions:


Proportions generally correspond to binomial distributions; but as n
increases, B D approaches N D in its characteristics and we can use normal
distribution as the sampling distribution. np, nq ≥ 5
Example: employees ‘promotable’ problem.
P is assumed to be .8; for n = 150, only 105 employees are found to be
‘promotable’. At α = 0.05, comment on the original assumption.
Example of proportion of defects in a batch; assumed proportion is .04;
n = 400, p bar is .0402; should we accept the batch at a significance level of
.05? In another batch of 400 nos the proportion of defectives was found to
be .06; Comment.

Shop floor efficiency example: Accepted proportion of rejections = .8;


After the process was improved by incorporating latest technologies, the
management wanted to know whether the original accepted levels of
rejections had come down. Sample of 100 was taken and 74 were found to
be rejected. Working at a significance level of .1, can the management
decide to enforce better efficiency levels?

Hypothesis test of mean when σ is not known: If n > 30, normal


distribution can be used and σhat can be estimated from‘s’.

If n < 30,‘t’ statistic is to be used with n-1 degrees of freedom.


A two tailed test is done the same way; for a one tailed test care should be
taken while getting the value of ‘t’ from the tables. For α = .05, the value
under .10 column should be taken.

Two Sample tests:


Sampling distribution of the difference between the means/proportions.
Suppose we have 2 populations P1 and P2; we want to see how these two
populations are related; what do we do?
If we take a random sample of size n1 from P1 and another random sample
of size n2 from P2; we then find x1bar - x2 bar;
If we now construct a distribution of these differences,( by choosing all
possible sample mean differences), we end up with what we call as a
‘Sampling Distribution of Difference between the Means’.

The mean of this distribution will obviously be


µ x1bar-x2bar = µx1bar - µx2bar = µ1 -µ2
The standard deviation of this distribution or the standard error of the ‘SD
of the Differences between the Means is given by
σ = SQ. root of σ12÷ n1 + σ22÷n2
x1bar-x2bar

If st.dev is not known, then estimates can be used.

Tests for differences between means, large sample sizes:


Two independent samples from 2 populations give the following data:
Mean1 = 32.3, s1 = 3, n1 = 42
Mean2 = 34, s2 = 4, n2 = 57
Using α = 0.05, can we say that the second population has a higher mean?

Examples from the previous exercise.

Small samples:
Calculation of standard error is different and‘t’ distribution to be used.
We assume that the two population variances are equal.

We use a weighted average of the sample standard deviations to estimate


the standard error of the distribution of the differences between the means.
This estimate of standard error is called the pooled estimate.
Edn programme example: p1 and p2 which is better?

Dependent samples:
The choice of the 1st sample had no effect on that of the choice of the 2nd
sample in the previous cases.

What do we do if the samples are dependent?


Efficiency of a program in increasing performance example.

We cannot check one set of people for their performance before the test was
taken and another set of people for checking the performance after
undergoing the course. We perform ‘paired difference tests’ in these cases
which is different from the test of the differences of two independent
samples.

Agri farm, software efficiency examples.

Weight loss programme: 10 people tested both before and after going thro’
the programme and their respective weights are as below.

Before 189 202 220 207 194 177 193 202 208 233

After 170 179 203 192 172 161 174 187 186 204

The programme promised to reduce the weight at least by 17 lbs.


Is the programme effective?(at .05 α level)
To perform a right tail test

Sample of 10 people’s weights checked as above.

Here we first prepare a distribution of the weight losses.


The mean weight loss is found to be 19.7 lbs.
The st. dev. Of the weight loss is found to be 4.4lbs.
Using this as an estimate of the standard deviation of all the participants,
the st. error of the SD of the distribution of the mean weight loss can be
taken as, σx1bar – x2bar = 4.4÷ √ 10 = 1.39
now, H0 : µH0 = 17
H1 : µH0 > 17
Observed value of mean loss is 19.7
Standard score of this value is 19.7 – 17 ÷ 1.39 = 1.94

From the table,‘t’ value for 9 deg of freedom and α = 0.1 we find that the
acceptance region is to the left of 1.833

Hence the observed mean loss falls in the rejection region.


Null Hypothesis is rejected and hence we conclude that the mean loss is
significantly more than the promised one and hence it is a programme
worth trying.
We would have arrived at an entirely different conclusion had we
considered the above as independent samples.
In a paired test st.dev of the difference between the means is smaller than in
the test of independent events and hence our above result.

Tests of proportions:
Two production processes under review;
Proportion of defectives p1 = .02; n1= 100; p2 = .025; n2 = 100
Are the processes equally efficient? α = .05
H0 : p1 – p2 = 0
H1 : p1 – p2 ≠ 0
σp1bar – p2bar = √p1q1÷n1 + p2q2 ÷ n2
proceed as before.

Probe values: ‘p’ values


When it is difficult to give a value to α, how to judge hypotheses?
To ask how unlikely is it to get a result like the one we got from the
sample?
The max value for α for which the hypothesis will be accepted is called its
PROBE VALUE.

EX: µ = 12; σ =.3; x bar = 12.25 cheese pieces example n = 9


St error .3/sq.rt of 9 = .1 Z xbar = 2.5 P ( xbar> 12.25) = .0062
Probe value = .0062x2 = 1.24% ( two tailed test). HAD THIS BEEN A ONE
TAILED TEST, ‘P’ VALUE WOULD HAVE BEEN ONLY .0062

Differences between 2 tailed and 1 tailed tests- probe value.

Sometimes exact values of ‘p’ value cannot be found out. If σ is not known
in the above example, we would have used a ‘t’ distribution with n-1
degrees of freedom and the ‘t’ table. At times this does not give an exact
value. For ex, µH0 = 50, xbar = 49.2, s = 1.4 and n = 16;
St error = 1.4/ sq rt of 16 = .35.
So, Z xbar = 49.2 – 50 / .35 = -2.286.
From ‘t’ table value lies between α values 0.02 and 0.05.

Computer packages take care of this drawback.

Potrebbero piacerti anche