Sei sulla pagina 1di 29

Use of Non-Parametric Test

Unit 9
Non-Parametric Tests
In majority of hypothesis tests inferences about population parameters, such as the
mean and the proportion are made with restrictive assumptions about the populations
from which we draw our samples. We assume that our samples either are large or
comes from normally distributed populations. But populations are not always normal.

Fortunately, in recent times statisticians have developed useful techniques that do not
make restrictive assumptions about the shape of population distributions. These are
known as distribution-free or, more commonly, nonparametric tests.

The hypotheses of a nonparametric test are concerned with something other than the
value of a population parameter
Non-Parametric Tests
1. The sign test for paired data - positive or negative signs are substituted for
quantitative values.

2. Mann-Whitney U test - A rank sum test used to determine whether two


independent samples have been drawn from the same population.

3. Kruskal-Wallis test - A rank sum test which generalizes the analysis of variance to
enable us to dispense with the assumption that the populations are normally
distributed.
Non-Parametric Tests
4. The one-sample runs test – used for determining the randomness with which
sampled items have been selected.

5. Rank correlation - a method for doing correlation analysis when the data are not
available to use in numerical form.

6. The Kolmogorov–Smirnov test – a method for determining the goodness of fit


between an observed sample and a theoretical probability distribution.
Advantages of Non-Parametric Method
1. They do not require us to make the assumptions that a population is distributed
in the shape of a normal curve or another specific shape.

2. Generally, they are easier to do and to understand.

3. Sometimes even formal ordering or ranking is not required. Often, all we can do
is describe one outcome as “better” than another
Disadvantages of Non-Parametric Method
1. They ignore a certain amount of information

2. They are not as efficient or sharp as parametric tests


Sign Tests for Paired Data
Based on the direction of a pair of observations and not on their numerical magnitude.

Example

Consider the result of a test panel of 40 college juniors evaluating the effectiveness of
two types of classes: large lectures by full time professors and small sections by graduate
assistants.

The responses to this request: “Indicate how you rate the effectiveness in transmitting
knowledge of these two types of classes by giving them a number from 4 to 1. A rating
of 4 is excellent and 1 is poor.
Sign Tests for Paired Data

The sign test can help us determine whether students feel there is a difference between the
effectiveness of the two types of classes. We convert the rating of the two teaching methods
into signs. Here a plus sign means the student prefers large lectures, a minus sign indicates a
preference for small sections, and a zero represents a tie (no preference).
Sign Tests for Paired Data
Number of + signs 19
Number of - signs 11
Number of 0s 10
Total sample size 40
Hypotheses
Because we are testing perceived differences, we shall exclude tie evaluations (0s).
If there is no difference between the two types of classes, p (the probability that the first
score exceeds the second score) would be 0.5, and we would expect to get about 15 plus
signs and 15 minus signs. We would set up our hypotheses like this:
H0: p =0.5 Null hypothesis: There is no difference between the two types of classes

H1: p <> 0.5 Alternative hypothesis: There is a difference between the two types of classes.
Sign Tests for Paired Data
This case is similar to binomial distribution, and we can use the normal distribution as an approximation
of binomial distribution (when np and nq are each at least 5. we can use the normal distribution to
approximate the binomial.)

Setting the problem symbolically


Sign Tests for Paired Data
Testing the Hypothesis for No difference

Standard Error of the proportion

Because we want to know whether the true


proportion is larger or smaller than the
hypothesized proportion, this is a two-tailed test.

Placing this standardized value, 1.462, on the z scale shows that the sample proportion falls well within the acceptance
region. Therefore, the chancellor should accept the null hypothesis that students perceive no difference between the
two types of classes
THE ONE-SAMPLE RUNS TEST
To allow us to test samples for the randomness of their order, statisticians have developed the theory
of runs. A run is a sequence of identical occurrences preceded and followed by different occurrences
or by none at all. If men and women enter a supermarket as follows, the sequence will contain three
runs:
THE ONE-SAMPLE RUNS TEST
A manufacturer of breakfast cereal uses a machine to insert randomly one of two types of toys in
each box. The company wants randomness so that every child in the neighborhood does not get the
same toy. Testers choose samples of 60 successive boxes to see whether the machine is properly
mixing the two types of toys. Using the symbols A and B to represent the two types of toys, a tester
reported that one such batch looked like this
B A B B B A A A B B A B B B B A A A A B
A B A A B B B A A B A A A A B B A B B A
A A A B B A B B B B A A B B A B A A B B
THE ONE-SAMPLE RUNS TEST
The number of runs, or r, is a statistic with its own special sampling distribution and its own
test Statisticians can prove that too many or too few runs in a sample indicate that
something other than chance was at work when the items were selected. A one sample
runs test, then, is based on the idea that too few or too many runs show that the items
were not chosen randomly
THE ONE-SAMPLE RUNS TEST
The standard error of the r statistic can be calculated with this formidable-looking
formula
THE ONE-SAMPLE RUNS TEST

Because too many or too few runs would indicate that the process by which the toys are
inserted into the boxes is not random, a two-tailed test is appropriate.

Figure shows that it falls well


within the critical values for
this test.

Therefore, management should accept the null hypothesis and conclude from this test that toys are being
inserted in boxes in random order.
Rank Correlation
The Correlation coefficient is a measure of closeness of association between two
variables.

When information is not available in the form of numerical values, we can assign
rankings to the items in each of the two variables we are studying, and a rank-
correlation coefficient can be calculated.

This is a measure of the correlation that exists between the two sets of ranks, a measure
of the degree of association between the variables that we would not have been able to
calculate otherwise.
Rank Correlation

rs = coefficient of Rank Correlation


n = number of paired observations
d = difference between the ranks for each pair of observations
Rank Correlation
Example:
An international organization has decided to make a preliminary investigation of average
year-round quality of air and the incidence of pulmonary-related diseases. A preliminary
study ranked 11 of the world’s major cities from 1 (worst) to 11 (best) on these two
variables.
Rank Correlation
Rank Correlation
Null hypothesis: There is no correlation in the ranked data of the
population
Alternate hypothesis: There is a correlation in the ranked data of
the population
Level of significance for testing the hypotheses

For small values of n, (n less than or equal to 30), the distribution of rs is not normal,
and unlike other small sample statistics we have encountered, it is not appropriate to
use the t distribution for testing hypotheses about the rank-correlation coefficient.
Instead, we use Spearman’s Rank Correlation Values from table, to determine the
acceptance and rejection regions for such hypotheses.
Rank Correlation

A two-tailed test is appropriate, so we look at the Spearman’s rank correlation table.

In the row for n = 11 (the number of cities) and the column for a significance level of
0.05, we find the critical values for rs are ±0.6091.

The upper limit of the acceptance region is 0.6091


The lower limit of the acceptance region is –0.6091.
Rank Correlation
Rank Correlation

A correlation coefficient of 0.736 suggests a substantial positive association between average air quality and
the occurrence of pulmonary disease, at least in the 11 cities sampled; that is, high levels of pollution go with
high incidence of pulmonary disease.
Rank Correlation
If the sample size is greater than 30, we can no longer use Appendix Table 7. However, when n is
greater than 30, the sampling distribution of rs is approximately normal, with a mean of zero and a
standard deviation as below.
Rank Correlation - Example
Rank Correlation - Example
Rank Correlation - Example
Rank Correlation - Example

We see that the rank correlation


coefficient lies far outside the
acceptance region. Therefore,
we would reject the null
hypothesis of no correlation and
conclude that bright people
tend to chose bright spouses.

Potrebbero piacerti anche